Skip to content
Gemma 3 12B logo

Gemma 3 12B

Verified

Google's open multimodal model for text and image understanding.

GoogleMultimodalOpen
Vision
Model page
Updated 2026-06-15

About Gemma 3 12B

Gemma 3 12B uses a transformer-based architecture that integrates vision encoders with language modeling layers. This design allows the model to process image and text data jointly within a single forward pass. The open-weight release gives researchers direct access to model parameters for inspection and modification.

Its strengths include native multimodal support and a large context window that accommodates lengthy documents paired with images. Because the weights are openly available, the model can be fine-tuned or quantized for deployment on consumer or enterprise hardware. Google provides it as part of the Gemma family to encourage experimentation and local inference.

Typical usage covers visual question answering, image captioning, and multimodal chat interfaces. Developers also apply it to document analysis tasks where both textual content and visual layout must be understood together. The 12B scale offers a balance between capability and the ability to run on mid-range GPUs or CPUs.

Capabilities

Long-context reasoning
Vision understanding
Multimodal instruction following
Text generation
Code generation
Visual question answering

How Gemma 3 12B compares

Gemma 3 12B (striped bar) vs other multimodal on intelligence, speed and price.

Price

USD per 1M output tokens · Lower is better · Gemma 3 12B ranks #6 of 124

$0.10
Gemma 3 4B
$0.10
Ministral 3 3B 2512
$0.10
Reka Edge
$0.15
Ministral 3 8B 2512
$0.15
Qwen3.5-9B
$0.15
Gemma 3 12B
$0.20
Ministral 3 14B 2512
$0.20
Mistral Small 3.2 24B
$0.26
Qwen3.5-Flash
$0.28
MiMo-V2.5
$0.30
Llama 4 Scout
$0.30
Seed 1.6 Flash
$0.30
Voxtral Small 24B 2507

Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).

Best for

Long-context multimodal document analysis

Processes 128k-token inputs combining text and images, such as full research papers with embedded figures and charts.

Extended visual reasoning tasks

Handles sequences of images alongside lengthy instructions for tasks like storyboarding or technical diagram interpretation.

Large-scale code and UI review

Reviews entire repositories or multi-screen app designs by ingesting both code files and screenshots in one context window.

Strengths & limitations

Strengths

  • +Efficient 12B scale for deployment
  • +Strong context window utilization
  • +Native text and image support
  • +Open-weight accessibility

Limitations

  • Smaller scale than frontier models
  • Multimodal depth constrained by size
  • Performance varies on complex tasks

Cost calculator

Estimate what Gemma 3 12B would cost for your usage.

$0.00013
per request
$1.25
estimated / month

Based on Gemma 3 12B's $0.05/1M input · $0.15/1M output. Estimate only — actual cost varies by provider and caching.

Download & self-host Gemma 3 12B

This is an open-weight model. Download the weights from Hugging Face or load it directly with Transformers.

12B
Parameters (safetensors)
1,694,469
Monthly downloads
753
Hugging Face likes
Download · transformers
# Install the Hugging Face CLI
pip install -U "huggingface_hub[cli]"

# Download the model weights
hf download google/gemma-3-12b-it

# Or load it directly in Python
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("google/gemma-3-12b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-12b-it", device_map="auto")
View google/gemma-3-12b-it on Hugging Face

Inference providers

Hosted APIs that serve Gemma 3 12B (via Hugging Face Inference Providers).

featherless-ai

Quick start

OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.

JavaScript · openai
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "google/gemma-3-12b-it",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);

Model slug: google/gemma-3-12b-it

Editor's verdict

Our take on Gemma 3 12B

Gemma 3 12B is Google's open-weight multimodal with a 131K-token context window.

At $0.15 per 1M output tokens, it is very cost-efficient for its class.

As an open-weight model you can self-host it (12B parameters) or call it through a hosted API.

Best suited to efficient 12b scale for deployment and strong context window utilization.

Did you find this helpful?

Frequently asked questions

The model supports a context length of 131072 tokens.

User reviews

Real, verified reviews from the community shape this model's rating.

Loading reviews…

Sign in to review

Other Gemma models

Sibling versions in the Gemma family from Google.

Promote Gemma 3 12B

Add this badge to your website, or share the tool.

DFeatured on DhanasviGemma 3 12B 1