0.52
2.08
Chat
Active

Qwen3.7 Plus

A multimodal reasoning model from Alibaba Qwen, optimized for coding workflows, tool usage, visual reasoning and agent-based tasks with text, image and video understanding.
Qwen3.7 PlusTechflow Logo - Techflow X Webflow Template

Qwen3.7 Plus

Qwen3.7 Plus is Alibaba's multimodal reasoning model with support for text, image, and video input. Built for agentic workflows, coding, visual reasoning, and complex multi-step tasks with tool use support.

Architecture: what makes it capable and efficient

Qwen3.7 Plus uses a sparse Mixture-of-Experts design. Rather than activating the full parameter space on every token, the model routes each token to a small set of specialized expert sub-networks. This keeps inference compute proportional to the active parameter count, not the total model size.

Sparse MoE (mixture of experts)Each token is routed to a specialized subset of the model's expert networks. Only a fraction of total parameters are active per forward pass, which keeps latency and cost well below what a comparable dense model would require.

Native multimodal fusionText, images, and video are processed in a single forward pass. There are no separate vision adapters or preprocessing pipelines — the model was trained from scratch on multimodal tokens, allowing natural cross-modal reasoning without stitching together separate models.

Configurable reasoning depthCallers can dial reasoning intensity up or down per request. At low settings the model behaves as a fast instruction-follower; at higher settings it performs multi-step chain-of-thought decomposition suited for coding, visual analysis, and agentic planning tasks.

Full tool use supportFunction calling is a production default, not an experimental add-on. Single tool calls, auto tool choice, JSON mode, and vision + tools are all supported out of the box.

Core capabilities

Text, image & video inputInclude screenshots, charts, short video clips, and documents alongside text in the same prompt. Inputs are natively understood — not converted or approximated by a separate model.

Tool use and function callingCall external APIs, retrieve structured data, and chain multi-step operations. OpenAI-compatible function calling format means existing tool integrations work without modification.

Thinking preservationReasoning traces persist across conversation turns, reducing redundant computation in iterative development workflows and multi-step planning tasks.

Coding and agentic workflowsThe model understands project-level context, generates well-structured code across languages, and participates in autonomous agent loops — planning, calling tools, and iterating without human intervention between steps.

201-language supportStrong multilingual coverage with consistent instruction-following quality across languages including English, Arabic, Japanese, Chinese, and Estonian.

Benchmark performance

Who should use Qwen3.7 Plus?

Teams building visual AI productsApplications that need to process images, screenshots, charts, or video alongside text — from document intelligence to UI-aware agents — without managing a separate vision model.

Agent developersEngineers building multi-step tool-use agents that require consistent structured output, long state histories, reliable multi-turn reasoning, and visual context awareness — without the cost of a tier-1 model.

Coding infrastructure teamsTeams running code generation, review, or refactoring pipelines that benefit from reasoning depth and multimodal inputs, such as processing screenshots of error states or UI mockups.

Production-scale appsApplications running millions of requests per month where per-token cost matters. At $0.52/M input, Qwen3.7 Plus delivers reasoning and vision capabilities at a fraction of frontier model pricing.

Research and analysis pipelinesTeams that combine text and visual inputs for competitive analysis, document review, and multi-source synthesis — where both modalities need to be processed in a single coherent pass.

Architecture: what makes it capable and efficient

Qwen3.7 Plus uses a sparse Mixture-of-Experts design. Rather than activating the full parameter space on every token, the model routes each token to a small set of specialized expert sub-networks. This keeps inference compute proportional to the active parameter count, not the total model size.

Sparse MoE (mixture of experts)Each token is routed to a specialized subset of the model's expert networks. Only a fraction of total parameters are active per forward pass, which keeps latency and cost well below what a comparable dense model would require.

Native multimodal fusionText, images, and video are processed in a single forward pass. There are no separate vision adapters or preprocessing pipelines — the model was trained from scratch on multimodal tokens, allowing natural cross-modal reasoning without stitching together separate models.

Configurable reasoning depthCallers can dial reasoning intensity up or down per request. At low settings the model behaves as a fast instruction-follower; at higher settings it performs multi-step chain-of-thought decomposition suited for coding, visual analysis, and agentic planning tasks.

Full tool use supportFunction calling is a production default, not an experimental add-on. Single tool calls, auto tool choice, JSON mode, and vision + tools are all supported out of the box.

Core capabilities

Text, image & video inputInclude screenshots, charts, short video clips, and documents alongside text in the same prompt. Inputs are natively understood — not converted or approximated by a separate model.

Tool use and function callingCall external APIs, retrieve structured data, and chain multi-step operations. OpenAI-compatible function calling format means existing tool integrations work without modification.

Thinking preservationReasoning traces persist across conversation turns, reducing redundant computation in iterative development workflows and multi-step planning tasks.

Coding and agentic workflowsThe model understands project-level context, generates well-structured code across languages, and participates in autonomous agent loops — planning, calling tools, and iterating without human intervention between steps.

201-language supportStrong multilingual coverage with consistent instruction-following quality across languages including English, Arabic, Japanese, Chinese, and Estonian.

Benchmark performance

Who should use Qwen3.7 Plus?

Teams building visual AI productsApplications that need to process images, screenshots, charts, or video alongside text — from document intelligence to UI-aware agents — without managing a separate vision model.

Agent developersEngineers building multi-step tool-use agents that require consistent structured output, long state histories, reliable multi-turn reasoning, and visual context awareness — without the cost of a tier-1 model.

Coding infrastructure teamsTeams running code generation, review, or refactoring pipelines that benefit from reasoning depth and multimodal inputs, such as processing screenshots of error states or UI mockups.

Production-scale appsApplications running millions of requests per month where per-token cost matters. At $0.52/M input, Qwen3.7 Plus delivers reasoning and vision capabilities at a fraction of frontier model pricing.

Research and analysis pipelinesTeams that combine text and visual inputs for competitive analysis, document review, and multi-source synthesis — where both modalities need to be processed in a single coherent pass.

Try it now

500+ AI Models

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

The Best Growth Choice
for Enterprise

Get API Key
Testimonials

Our Clients' Voices