Qwen3 VL 235B A22B Instruct
VerifiedOpen-weight multimodal model for advanced vision-language reasoning.
About Qwen3 VL 235B A22B Instruct
The model combines vision and language processing in a single architecture designed for joint understanding of images and long text sequences. Its 262144-token context supports detailed analysis of documents, charts, or multi-image inputs without truncation.
Alibaba Qwen released the weights openly, allowing fine-tuning and deployment by researchers and developers. The instruct tuning emphasizes accurate responses to user directives involving visual content.
Common applications include visual question answering, image-grounded reasoning, and multimodal document analysis where both textual and visual information must be integrated.
Capabilities
How Qwen3 VL 235B A22B Instruct compares
Qwen3 VL 235B A22B Instruct (striped bar) vs other multimodal on intelligence, speed and price.
Price
USD per 1M output tokens · Lower is better · Qwen3 VL 235B A22B Instruct ranks #17 of 102
Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).
Best for
Long-context document understanding
Processes combined text and images across 262144 tokens for tasks like analyzing lengthy reports with embedded charts and diagrams.
Visual question answering
Delivers precise answers to questions about image content by grounding responses directly in visual details and accompanying text.
Multimodal instruction following
Executes complex instructions that require simultaneous interpretation of images and text, such as generating descriptions or performing visual reasoning.
Strengths & limitations
Strengths
- +Strong text-image integration
- +Handles very long contexts
- +Large-scale parameter capacity
- +Effective instruction tuning
Limitations
- –High inference compute cost
- –Image-only vision support
- –Risk of visual hallucinations
Cost calculator
Estimate what Qwen3 VL 235B A22B Instruct would cost for your usage.
Based on Qwen3 VL 235B A22B Instruct's $0.20/1M input · $0.88/1M output. Estimate only — actual cost varies by provider and caching.
Quick start
OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const completion = await client.chat.completions.create({
model: "qwen/qwen3-vl-235b-a22b-instruct",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);Model slug: qwen/qwen3-vl-235b-a22b-instruct
Editor's verdict
Qwen3 VL 235B A22B Instruct is Alibaba Qwen's open-weight multimodal with a 262K-token context window.
At $0.88 per 1M output tokens, it is very cost-efficient for its class.
As an open-weight model you can self-host it or call it through a hosted API.
Best suited to strong text-image integration and handles very long contexts.
Frequently asked questions
The model provides a context window of 262144 tokens for handling extended multimodal inputs.
User reviews
Real, verified reviews from the community shape this model's rating.
Loading reviews…
Other Qwen models
Sibling versions in the Qwen family from Alibaba Qwen.