Skip to content
Gemma 3 4B logo

Gemma 3 4B

Verified

Google's open multimodal model for efficient text and image understanding.

GoogleMultimodalOpen
Vision
Model page
Updated 2026-06-15

About Gemma 3 4B

The model combines Google's transformer-based design with multimodal capabilities to process images alongside long text sequences. Its open-weight release allows customization while maintaining strong performance across varied inputs. This architecture emphasizes efficiency for practical deployment scenarios.

Users commonly apply Gemma 3 4B to image captioning, visual question answering, and document analysis involving both text and visuals. It suits prototyping, fine-tuning experiments, and integration into applications needing extended context handling without proprietary restrictions.

Capabilities

Multimodal text and image understanding
Long-context processing
Instruction following and chat
Visual question answering
Basic reasoning and summarization
Efficient on-device inference

How Gemma 3 4B compares

Gemma 3 4B (striped bar) vs other multimodal on intelligence, speed and price.

Price

USD per 1M output tokens · Lower is better · Gemma 3 4B ranks #1 of 139

$0.10
Gemma 3 4B
$0.10
Ministral 3 3B 2512
$0.10
Reka Edge
$0.15
Ministral 3 8B 2512
$0.15
Qwen3.5-9B
$0.15
Gemma 3 12B
$0.16
Gemma 3 27B
$0.18
Llama Guard 4 12B
$0.20
Ministral 3 14B 2512
$0.20
Mistral Small 3.2 24B
$0.20
UI-TARS 7B
$0.26
Qwen3.5-Flash
$0.28
MiMo-V2.5

Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).

Best for

Long-form document analysis with visuals

Processes reports or research papers exceeding 100k tokens that incorporate charts, diagrams, and images to extract integrated insights across text and visuals.

Multimodal educational content review

Reviews textbooks or lecture materials combining extensive text passages with figures and illustrations for accurate summarization and question answering.

Technical documentation with embedded graphics

Handles large codebases or engineering specs up to 131k tokens that include screenshots and diagrams for debugging or compliance checks.

Strengths & limitations

Strengths

  • +Compact size enables fast local deployment
  • +Strong context length for a small model
  • +Open weights support fine-tuning
  • +Balanced multimodal capabilities

Limitations

  • Limited depth on complex multi-step tasks
  • Weaker performance than larger models on advanced reasoning
  • Supports only text and image inputs

Cost calculator

Estimate what Gemma 3 4B would cost for your usage.

$0.00010
per request
$1
estimated / month

Based on Gemma 3 4B's $0.05/1M input · $0.10/1M output. Estimate only — actual cost varies by provider and caching.

Quick start

OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.

JavaScript · openai
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "google/gemma-3-4b-it",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);

Model slug: google/gemma-3-4b-it

Editor's verdict

Our take on Gemma 3 4B

Gemma 3 4B is Google's open-weight multimodal with a 131K-token context window.

At $0.10 per 1M output tokens, it is very cost-efficient for its class.

As an open-weight model you can self-host it or call it through a hosted API.

Best suited to compact size enables fast local deployment and strong context length for a small model.

Did you find this helpful?

Frequently asked questions

The model supports a context window of 131072 tokens for processing long inputs.

User reviews

Real, verified reviews from the community shape this model's rating.

Loading reviews…

Sign in to review

Other Gemma models

Sibling versions in the Gemma family from Google.

Promote Gemma 3 4B

Add this badge to your website, or share the tool.

DFeatured on DhanasviGemma 3 4B 1