Skip to content
Gemma 4 31B logo

Gemma 4 31B

Verified

Google's open multimodal model for long-context image, text and video tasks.

GoogleMultimodalOpen
Function callingJSON modeStructured outputsReasoningVision
Model page
Updated 2026-06-15

About Gemma 4 31B

Gemma 4 31B integrates multimodal processing so that images, video frames and text can be handled together. Its 262144-token context window permits analysis of lengthy sequences without truncation. The open-weight release enables direct access for research and customization.

Strengths include coherent handling of mixed visual and textual streams over long spans. This supports detailed video summarization, image-grounded dialogue and cross-modal retrieval. Typical usage covers media analysis tools, educational content platforms and developer prototypes needing extended multimodal context.

Capabilities

Multimodal understanding
Long-context reasoning
Image analysis
Video processing
Cross-modal reasoning
Text generation

How Gemma 4 31B compares

Gemma 4 31B (striped bar) vs other multimodal on intelligence, speed and price.

Price

USD per 1M output tokens · Lower is better · Gemma 4 31B ranks #17 of 132

$0.26
Qwen3.5-Flash
$0.28
MiMo-V2.5
$0.30
Llama 4 Scout
$0.30
Seed 1.6 Flash
$0.30
Voxtral Small 24B 2507
$0.33
Gemma 4 26B A4B
$0.35
Gemma 4 31B
$0.40
GPT-5 Nano
$0.40
Gemini 2.5 Flash Lite
$0.40
GPT-4.1 Nano
$0.40
Gemini 2.5 Flash Lite Preview 09-2025
$0.40
Seed-2.0-Mini
$0.42
Qwen3 VL 32B Instruct

Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).

Best for

Long-form Document Analysis

The 262144-token context window supports processing entire books or lengthy reports while maintaining coherence across multimodal elements like embedded images.

Video Scene Interpretation

Video processing combined with cross-modal reasoning allows the model to analyze footage sequences and generate accurate textual summaries or answer queries about visual events.

Image-Based Research Queries

Image analysis and multimodal understanding enable detailed examination of visual data alongside text prompts for tasks such as scientific figure interpretation.

Strengths & limitations

Strengths

  • +Strong integration across image, text, and video inputs
  • +Effective use of extended context windows
  • +Versatile for mixed-modality tasks
  • +Backed by Google's model development

Limitations

  • High computational demands at 31B scale
  • Video comprehension may degrade over very long sequences
  • Primarily optimized for multimodal rather than pure text workloads

Pricing by provider

Live per-provider pricing & uptime, routed via OpenRouter. Prices are USD per 1M tokens.

ProviderInput /1MOutput /1MContextUptime
WandB(bf16)$0.12$0.35262K100.0%
Venice(bf16)$0.12$0.36256K99.8%
DeepInfra(fp4)$0.12$0.37262K95.8%
DeepInfra(fp8)$0.13$0.38262K97.7%
SiliconFlow(fp8)$0.13$0.40262K99.8%
Novita(bf16)$0.14$0.40262K99.9%
Parasail(fp8)$0.15$0.40262K94.1%
Chutes(fp4)$0.15$0.42131K98.4%
Phala$0.15$0.46262K98.5%
Ambient$0.20$0.8066K97.9%
Together$0.28$0.86262K95.6%
Together$0.39$0.97262K79.7%

Cost calculator

Estimate what Gemma 4 31B would cost for your usage.

$0.00030
per request
$2.95
estimated / month

Based on Gemma 4 31B's $0.12/1M input · $0.35/1M output. Estimate only — actual cost varies by provider and caching.

Quick start

OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.

JavaScript · openai
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "google/gemma-4-31b-it",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);

Model slug: google/gemma-4-31b-it

Editor's verdict

Our take on Gemma 4 31B

Gemma 4 31B is Google's open-weight multimodal with a 262K-token context window.

At $0.35 per 1M output tokens, it is very cost-efficient for its class, served by 12 providers.

As an open-weight model you can self-host it or call it through a hosted API.

Best suited to strong integration across image, text, and video inputs and effective use of extended context windows.

Did you find this helpful?

Frequently asked questions

The model provides a context window of 262144 tokens for handling extended inputs in reasoning tasks.

User reviews

Real, verified reviews from the community shape this model's rating.

Loading reviews…

Sign in to review

Other Gemma models

Sibling versions in the Gemma family from Google.

Promote Gemma 4 31B

Add this badge to your website, or share the tool.

DFeatured on DhanasviGemma 4 31B 1