Llama 4 Scout
VerifiedMeta's open multimodal model for long text and image sequences.
About Llama 4 Scout
Llama 4 Scout uses a multimodal architecture designed by Meta to accept both text and image inputs. Its 10 million token context window allows the model to handle very long combined sequences without truncation. The weights are openly available for inspection and modification.
The design emphasizes flexibility for tasks that require sustained attention across large volumes of mixed media. Open-weight release lowers barriers for academic and commercial experimentation. Users can fine-tune or deploy the model in environments where data privacy or customization is important.
Typical applications include document analysis that incorporates diagrams, long-form visual storytelling, and research involving extensive image-text corpora. Developers often integrate it into pipelines that need to maintain coherence over thousands of pages or image collections.
Capabilities
How Llama 4 Scout compares
Llama 4 Scout (striped bar) vs other multimodal on intelligence, speed and price.
Price
USD per 1M output tokens · Lower is better · Llama 4 Scout ranks #11 of 124
Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).
Best for
Long-Document Multimodal Analysis
Llama 4 Scout excels at ingesting and reasoning over entire books or research archives that combine text with images, thanks to its 10 million token context window.
Extended Multimodal Reasoning Tasks
The model handles complex queries that span thousands of pages of mixed text and visual data, such as reviewing technical manuals with diagrams in one pass.
Large-Scale Knowledge Integration
It supports synthesizing insights across massive multimodal collections like corporate archives containing reports, charts, and photographs.
Strengths & limitations
Strengths
- +Extremely large context window
- +Native multimodal input support
- +Strong reasoning over long inputs
Limitations
- –High compute cost at maximum context
- –Limited to text and image modalities only
- –May exhibit latency on very long sequences
Cost calculator
Estimate what Llama 4 Scout would cost for your usage.
Based on Llama 4 Scout's $0.10/1M input · $0.30/1M output. Estimate only — actual cost varies by provider and caching.
Quick start
OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const completion = await client.chat.completions.create({
model: "meta-llama/llama-4-scout",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);Model slug: meta-llama/llama-4-scout
Editor's verdict
Llama 4 Scout is Meta's open-weight multimodal with a 10000K-token context window.
At $0.30 per 1M output tokens, it is very cost-efficient for its class.
As an open-weight model you can self-host it or call it through a hosted API.
Best suited to extremely large context window and native multimodal input support.
Frequently asked questions
Llama 4 Scout provides a context window of 10,000,000 tokens.
User reviews
Real, verified reviews from the community shape this model's rating.
Loading reviews…
Other Llama models
Sibling versions in the Llama family from Meta.