GLM 4.6V
VerifiedMultimodal model for unified image, text, and video processing.
About GLM 4.6V
GLM 4.6V is engineered as a closed-weight multimodal system. It integrates processing across visual, textual, and video modalities. The design accommodates long contexts reaching 131072 tokens.
Strengths center on seamless cross-modal understanding without open-weight distribution. It maintains consistent performance across diverse input types. Z.AI targets users requiring reliable multimodal capabilities.
Common applications involve video analysis, image captioning, and text generation from mixed media. Researchers and developers use it for tasks needing extended context handling. It fits professional workflows that prioritize proprietary model access.
Capabilities
How GLM 4.6V compares
GLM 4.6V (striped bar) vs other multimodal on intelligence, speed and price.
Price
USD per 1M output tokens · Lower is better · GLM 4.6V ranks #12 of 63
Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).
Best for
Long Video Content Analysis
Processes extended video inputs with multimodal understanding and long-context reasoning to deliver detailed breakdowns and insights from lengthy footage.
Cross-Modal Instruction Tasks
Follows complex instructions that combine images, video, and text to produce accurate analyses and generated responses across modalities.
Visual Document Reasoning
Applies visual and text understanding over large contexts to handle multi-page documents containing charts, images, and supporting text.
Strengths & limitations
Strengths
- +Native support for image, text, and video inputs
- +Large context window for extended documents or conversations
- +Unified multimodal processing in a single model
Limitations
- –Video handling can be computationally intensive
- –Performance varies across languages and domains
- –Multimodal models may inherit vision or language biases
Cost calculator
Estimate what GLM 4.6V would cost for your usage.
Based on GLM 4.6V's $0.30/1M input · $0.90/1M output. Estimate only — actual cost varies by provider and caching.
Quick start
OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const completion = await client.chat.completions.create({
model: "z-ai/glm-4.6v",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);Model slug: z-ai/glm-4.6v
Editor's verdict
GLM 4.6V is Z.AI's proprietary multimodal with a 131K-token context window.
At $0.90 per 1M output tokens, it is very cost-efficient for its class.
It is available through Z.AI's API and aggregators like OpenRouter.
Best suited to native support for image, text, and video inputs and large context window for extended documents or conversations.
Frequently asked questions
GLM 4.6V provides a context window of 131072 tokens.
User reviews
Real, verified reviews from the community shape this model's rating.
Loading reviews…
Other GLM models
Sibling versions in the GLM family from Z.AI.