How can I access the Qwen3 VL model from Alibaba Qwen?

Access is available via Alibaba Qwen's official API and platform services.

What is the pricing structure for this model?

Pricing information is listed on the official Alibaba Qwen website and varies by usage volume.

What are the primary use cases for this multimodal model?

It is optimized for vision-text reasoning, visual question answering, and analysis of images alongside long documents.

Qwen3 VL 235B A22B Thinking

Verified

Open-weight multimodal model for advanced text and image reasoning.

Alibaba QwenMultimodalOpen

Vision

Model page

Updated 2026-06-14

About Qwen3 VL 235B A22B Thinking

The model uses a large-scale multimodal transformer design that jointly processes visual and textual data. It incorporates 235 billion total parameters while remaining fully open-weight for broad accessibility. A context length of 131072 tokens enables handling of lengthy documents paired with images.

Strengths include native support for combined text-image inputs and the flexibility of open weights for customization. The architecture is suited to tasks that require integration of visual understanding with language reasoning at scale.

Typical usage covers visual question answering, document interpretation, and multimodal analysis in research or production environments. Developers fine-tune or deploy it directly for applications needing robust cross-modal comprehension.

Capabilities

Multimodal vision-text reasoning

Long-context document analysis

Visual question answering

Detailed image interpretation

Step-by-step chain-of-thought reasoning

Code and diagram understanding

How Qwen3 VL 235B A22B Thinking compares

Qwen3 VL 235B A22B Thinking (striped bar) vs other multimodal on intelligence, speed and price.

Price

USD per 1M output tokens · Lower is better · Qwen3 VL 235B A22B Thinking ranks #49 of 102

$2.0

Kimi K2.5

$2.1

Qwen3.5-122B-A10B

$2.3

Qwen3.5 397B A17B

$2.5

Grok 4.20

$2.5

Grok 4.3

$2.5

Nova 2 Lite

$2.6

Qwen3 VL 235B A22B Thinking

$3.0

Gemini 3 Flash Preview

$3.2

Qwen3.6 27B

$3.4

Kimi K2.6

$3.4

MoonshotAI Kimi Latest

$3.4

MoonshotAI Kimi Latest

$3.5

Kimi K2.7 Code

Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).

Best for

Visual Question Answering on Complex Scenes

The model performs detailed image interpretation combined with text reasoning to answer questions about visual content accurately.

Long-Context Document Analysis with Visuals

It processes documents up to 131072 tokens while integrating images, charts, and diagrams for comprehensive understanding.

Step-by-Step Reasoning on Code and Diagrams

Users can leverage its chain-of-thought capabilities to interpret technical diagrams and code structures in multimodal inputs.

Strengths & limitations

Strengths

+Strong fusion of visual and textual information
+Effective handling of extended contexts
+Robust reasoning on complex multimodal inputs

Limitations

–High inference compute requirements
–Limited to static images
–May hallucinate details in ambiguous visuals

Cost calculator

Estimate what Qwen3 VL 235B A22B Thinking would cost for your usage.

Input tokens / requestOutput tokens / requestRequests / month

$0.00156

per request

$15.6

estimated / month

Based on Qwen3 VL 235B A22B Thinking's $0.26/1M input · $2.60/1M output. Estimate only — actual cost varies by provider and caching.

Quick start

OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.

JavaScript · openai

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "qwen/qwen3-vl-235b-a22b-thinking",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);

Model slug: qwen/qwen3-vl-235b-a22b-thinking

Editor's verdict

Our take on Qwen3 VL 235B A22B Thinking

Qwen3 VL 235B A22B Thinking is Alibaba Qwen's open-weight multimodal with a 131K-token context window.

At $2.60 per 1M output tokens, it is mid-priced for its class.

As an open-weight model you can self-host it or call it through a hosted API.

Best suited to strong fusion of visual and textual information and effective handling of extended contexts.

Did you find this helpful?

Frequently asked questions

The model supports a context window of 131072 tokens for handling extended inputs.

User reviews

Real, verified reviews from the community shape this model's rating.

Loading reviews…

Other Qwen models

Sibling versions in the Qwen family from Alibaba Qwen.

Qwen3.7 Max

Alibaba Qwen · Language Models

Verified

Qwen3.7 Max processes up to one million tokens in a single pass.

OpenII 56.61000K ctx$3.75/1M out

Qwen3.7 Plus

Alibaba Qwen · Multimodal

Verified

Open-weight multimodal model for million-token text and image tasks.

OpenII 53.31000K ctx$1.28/1M out

Qwen3.6 Max Preview

Alibaba Qwen · Language Models

Verified

Open-weight LLM optimized for long-context text reasoning and analysis.

OpenII 51.8262K ctx$6.24/1M out

Qwen3.6 27B

Alibaba Qwen · Multimodal

Verified

Multimodal model for long-context text, image, and video processing.

OpenII 45.8262K ctx$3.17/1M out

Qwen3.6 35B A3B

Alibaba Qwen · Multimodal

Verified

Multimodal model for long-context text, image, and video analysis.

OpenII 43.5262K ctx$1.00/1M out

Qwen Plus 0728

Alibaba Qwen · Language Models

Verified

Open-weight LLM with a 1M-token context for long text tasks.

Open1000K ctx$0.78/1M out

Promote Qwen3 VL 235B A22B Thinking

Add this badge to your website, or share the tool.

DFeatured on DhanasviQwen3 VL 235B A22B Thinking 1

Qwen3 VL 235B A22B Thinking

About Qwen3 VL 235B A22B Thinking

Capabilities