GPT-4o (2024-08-06)
VerifiedMultimodal model optimized for integrated text, image, and file tasks.
About GPT-4o (2024-08-06)
Built as a proprietary system, GPT-4o combines vision and language processing in a single architecture. It supports file uploads alongside images and text for coherent multi-turn interactions. The design prioritizes low-latency responses while maintaining a large context capacity.
Key strengths include accurate visual analysis paired with textual reasoning and document handling. It performs well on tasks that require cross-referencing images with long-form content. Common uses range from API-driven applications to chat interfaces for research, creative work, and data extraction.
Capabilities
Benchmarks & performance
Independent evaluation scores and measured speed.
Source: Artificial Analysis
How GPT-4o (2024-08-06) compares
GPT-4o (2024-08-06) (striped bar) vs other multimodal on intelligence, speed and price.
Intelligence
Artificial Analysis Intelligence Index · Higher is better · GPT-4o (2024-08-06) ranks #76 of 88
Speed
Output tokens per second · Higher is better · GPT-4o (2024-08-06) ranks #39 of 76
Price
USD per 1M output tokens · Lower is better · GPT-4o (2024-08-06) ranks #112 of 155
Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).
Best for
Multimodal Image and Text Analysis
The model excels at vision-based reasoning tasks that combine image inputs with textual queries, such as extracting insights from charts, diagrams, or photographs alongside explanatory text.
Long-Context Document Processing
It handles extended documents and conversations up to its full context length, enabling coherent summarization, analysis, and question-answering across large files or multi-turn interactions.
Code Generation and Technical Workflows
Strong performance in code generation, debugging, and analysis makes it suitable for software development tasks that also require interpreting related documentation or visual mockups.
Strengths & limitations
Strengths
- +Strong cross-modal integration
- +Versatile across creative and analytical tasks
- +Handles complex multi-step instructions well
Limitations
- –No native audio or video processing
- –Knowledge cutoff at training date
- –Can still produce hallucinations on edge cases
Cost calculator
Estimate what GPT-4o (2024-08-06) would cost for your usage.
Based on GPT-4o (2024-08-06)'s $2.50/1M input · $10.00/1M output. Estimate only — actual cost varies by provider and caching.
Quick start
OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const completion = await client.chat.completions.create({
model: "openai/gpt-4o-2024-08-06",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);Model slug: openai/gpt-4o-2024-08-06
Editor's verdict
GPT-4o (2024-08-06) is OpenAI's proprietary multimodal with a 128K-token context window.
On independent testing it scores 14.5 on the Artificial Analysis Intelligence Index, running at roughly 102 tokens per second with about 0.89s to first token.
At $10.00 per 1M output tokens, it is premium-priced for its class.
It is available through OpenAI's API and aggregators like OpenRouter.
Best suited to strong cross-modal integration and versatile across creative and analytical tasks.
Frequently asked questions
The model supports a context window of 128,000 tokens.
User reviews
Real, verified reviews from the community shape this model's rating.
Loading reviews…
Other GPT models
Sibling versions in the GPT family from OpenAI.