Gemma 3 4B
VerifiedGoogle's open multimodal model for efficient text and image understanding.
About Gemma 3 4B
The model combines Google's transformer-based design with multimodal capabilities to process images alongside long text sequences. Its open-weight release allows customization while maintaining strong performance across varied inputs. This architecture emphasizes efficiency for practical deployment scenarios.
Users commonly apply Gemma 3 4B to image captioning, visual question answering, and document analysis involving both text and visuals. It suits prototyping, fine-tuning experiments, and integration into applications needing extended context handling without proprietary restrictions.
Capabilities
How Gemma 3 4B compares
Gemma 3 4B (striped bar) vs other multimodal on intelligence, speed and price.
Price
USD per 1M output tokens · Lower is better · Gemma 3 4B ranks #1 of 139
Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).
Best for
Long-form document analysis with visuals
Processes reports or research papers exceeding 100k tokens that incorporate charts, diagrams, and images to extract integrated insights across text and visuals.
Multimodal educational content review
Reviews textbooks or lecture materials combining extensive text passages with figures and illustrations for accurate summarization and question answering.
Technical documentation with embedded graphics
Handles large codebases or engineering specs up to 131k tokens that include screenshots and diagrams for debugging or compliance checks.
Strengths & limitations
Strengths
- +Compact size enables fast local deployment
- +Strong context length for a small model
- +Open weights support fine-tuning
- +Balanced multimodal capabilities
Limitations
- –Limited depth on complex multi-step tasks
- –Weaker performance than larger models on advanced reasoning
- –Supports only text and image inputs
Cost calculator
Estimate what Gemma 3 4B would cost for your usage.
Based on Gemma 3 4B's $0.05/1M input · $0.10/1M output. Estimate only — actual cost varies by provider and caching.
Quick start
OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const completion = await client.chat.completions.create({
model: "google/gemma-3-4b-it",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);Model slug: google/gemma-3-4b-it
Editor's verdict
Gemma 3 4B is Google's open-weight multimodal with a 131K-token context window.
At $0.10 per 1M output tokens, it is very cost-efficient for its class.
As an open-weight model you can self-host it or call it through a hosted API.
Best suited to compact size enables fast local deployment and strong context length for a small model.
Frequently asked questions
The model supports a context window of 131072 tokens for processing long inputs.
User reviews
Real, verified reviews from the community shape this model's rating.
Loading reviews…
Other Gemma models
Sibling versions in the Gemma family from Google.