GLM 4.5V
VerifiedMultimodal model for integrated text and image tasks.
About GLM 4.5V
GLM 4.5V features a context window of 65536 tokens. This supports extended multimodal inputs including detailed image analysis alongside text. The model is closed-weight with no public parameter count disclosed.
Its design emphasizes joint handling of textual and visual modalities. Strengths lie in coherent responses across combined data types without requiring separate models. Access remains restricted to authorized users through Z.AI.
Typical usage covers image captioning, visual question answering, and mixed-media document processing. Developers deploy it in systems needing unified text-image understanding. Workflows often include professional analysis and content generation tasks.
Capabilities
How GLM 4.5V compares
GLM 4.5V (striped bar) vs other multimodal on intelligence, speed and price.
Price
USD per 1M output tokens · Lower is better · GLM 4.5V ranks #32 of 97
Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).
Best for
Visual Question Answering
The model excels at responding to questions that combine textual queries with image inputs through its multimodal text and image understanding.
Document Analysis with Visuals
It performs well on tasks involving document comprehension that include charts, diagrams, and other visual elements alongside text.
Extended Multimodal Reasoning
The model supports long-context reasoning across text and images within its 65536-token context window for cross-modal tasks.
Strengths & limitations
Strengths
- +Strong vision-language integration
- +Handles extended 64k token contexts
- +Effective for real-world image+text tasks
- +Flexible multimodal input processing
Limitations
- –Limited to text and image modalities
- –Can struggle with highly complex or ambiguous visuals
- –Vision performance depends on image quality and clarity
Cost calculator
Estimate what GLM 4.5V would cost for your usage.
Based on GLM 4.5V's $0.60/1M input · $1.80/1M output. Estimate only — actual cost varies by provider and caching.
Quick start
OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const completion = await client.chat.completions.create({
model: "z-ai/glm-4.5v",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);Model slug: z-ai/glm-4.5v
Editor's verdict
GLM 4.5V is Z.AI's proprietary multimodal with a 66K-token context window.
At $1.80 per 1M output tokens, it is mid-priced for its class.
It is available through Z.AI's API and aggregators like OpenRouter.
Best suited to strong vision-language integration and handles extended 64k token contexts.
Frequently asked questions
The model provides a context window of 65536 tokens.
User reviews
Real, verified reviews from the community shape this model's rating.
Loading reviews…
Other GLM models
Sibling versions in the GLM family from Z.AI.