Gemma 4 31B
VerifiedGoogle's open multimodal model for long-context image, text and video tasks.
About Gemma 4 31B
Gemma 4 31B integrates multimodal processing so that images, video frames and text can be handled together. Its 262144-token context window permits analysis of lengthy sequences without truncation. The open-weight release enables direct access for research and customization.
Strengths include coherent handling of mixed visual and textual streams over long spans. This supports detailed video summarization, image-grounded dialogue and cross-modal retrieval. Typical usage covers media analysis tools, educational content platforms and developer prototypes needing extended multimodal context.
Capabilities
How Gemma 4 31B compares
Gemma 4 31B (striped bar) vs other multimodal on intelligence, speed and price.
Price
USD per 1M output tokens · Lower is better · Gemma 4 31B ranks #17 of 132
Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).
Best for
Long-form Document Analysis
The 262144-token context window supports processing entire books or lengthy reports while maintaining coherence across multimodal elements like embedded images.
Video Scene Interpretation
Video processing combined with cross-modal reasoning allows the model to analyze footage sequences and generate accurate textual summaries or answer queries about visual events.
Image-Based Research Queries
Image analysis and multimodal understanding enable detailed examination of visual data alongside text prompts for tasks such as scientific figure interpretation.
Strengths & limitations
Strengths
- +Strong integration across image, text, and video inputs
- +Effective use of extended context windows
- +Versatile for mixed-modality tasks
- +Backed by Google's model development
Limitations
- –High computational demands at 31B scale
- –Video comprehension may degrade over very long sequences
- –Primarily optimized for multimodal rather than pure text workloads
Pricing by provider
Live per-provider pricing & uptime, routed via OpenRouter. Prices are USD per 1M tokens.
| Provider | Input /1M | Output /1M | Context | Uptime |
|---|---|---|---|---|
| WandB(bf16) | $0.12 | $0.35 | 262K | 100.0% |
| Venice(bf16) | $0.12 | $0.36 | 256K | 99.8% |
| DeepInfra(fp4) | $0.12 | $0.37 | 262K | 95.8% |
| DeepInfra(fp8) | $0.13 | $0.38 | 262K | 97.7% |
| SiliconFlow(fp8) | $0.13 | $0.40 | 262K | 99.8% |
| Novita(bf16) | $0.14 | $0.40 | 262K | 99.9% |
| Parasail(fp8) | $0.15 | $0.40 | 262K | 94.1% |
| Chutes(fp4) | $0.15 | $0.42 | 131K | 98.4% |
| Phala | $0.15 | $0.46 | 262K | 98.5% |
| Ambient | $0.20 | $0.80 | 66K | 97.9% |
| Together | $0.28 | $0.86 | 262K | 95.6% |
| Together | $0.39 | $0.97 | 262K | 79.7% |
Cost calculator
Estimate what Gemma 4 31B would cost for your usage.
Based on Gemma 4 31B's $0.12/1M input · $0.35/1M output. Estimate only — actual cost varies by provider and caching.
Quick start
OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const completion = await client.chat.completions.create({
model: "google/gemma-4-31b-it",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);Model slug: google/gemma-4-31b-it
Editor's verdict
Gemma 4 31B is Google's open-weight multimodal with a 262K-token context window.
At $0.35 per 1M output tokens, it is very cost-efficient for its class, served by 12 providers.
As an open-weight model you can self-host it or call it through a hosted API.
Best suited to strong integration across image, text, and video inputs and effective use of extended context windows.
Frequently asked questions
The model provides a context window of 262144 tokens for handling extended inputs in reasoning tasks.
User reviews
Real, verified reviews from the community shape this model's rating.
Loading reviews…
Other Gemma models
Sibling versions in the Gemma family from Google.