Is GPT-4o-mini multimodal?

Yes, it supports multimodal understanding of text, images, and files.

How is GPT-4o-mini accessed?

It is available through OpenAI's API and platform for developers and users.

What types of tasks does GPT-4o-mini handle well?

It performs long-context reasoning, code generation, image description, file summarization, and general instruction following.

GPT-4o-mini (2024-07-18)

Verified

Fast, affordable multimodal model for text and image tasks.

OpenAIMultimodalClosedII 12.6

Vision

Model page

Updated 2026-06-15

About GPT-4o-mini (2024-07-18)

GPT-4o-mini is built as a smaller-scale multimodal system from OpenAI. It supports combined text and visual inputs along with file handling while remaining fully proprietary. The architecture emphasizes reduced computational demands compared with larger siblings.

Its strengths lie in balancing capability with speed and cost for everyday workloads. The model processes mixed media reliably without requiring open weights or local hosting. This design suits production environments where latency and pricing matter.

Developers commonly use it for chat interfaces, image analysis, and document summarization. It integrates well into applications needing quick multimodal responses. Typical deployments include customer support tools and content review pipelines.

Capabilities

Multimodal understanding (text, image, file)

Long-context reasoning

Code generation and analysis

Image description and visual reasoning

File content extraction and summarization

General instruction following and conversation

Benchmarks & performance

Independent evaluation scores and measured speed.

12.6

Intelligence Index

Tokens / sec

1.23s

Time to first token

Source: Artificial Analysis

How GPT-4o-mini (2024-07-18) compares

GPT-4o-mini (2024-07-18) (striped bar) vs other multimodal on intelligence, speed and price.

Intelligence

Artificial Analysis Intelligence Index · Higher is better · GPT-4o-mini (2024-07-18) ranks #82 of 88

GPT-4o

Qwen3 VL 8B Instruct

GPT-4 Turbo

Llama 4 Scout

GPT-4.1 Nano

GPT-4o-mini

Claude 3 Haiku

Saba

Gemma 3 27B

Gemma 3 12B

Gemma 3 4B

Speed

Output tokens per second · Higher is better · GPT-4o-mini (2024-07-18) ranks #57 of 76

Claude Fable 5

Claude Opus 4.8

GPT-5.5

MiniMax M3

Qwen3.6 27B

GPT-4o-mini

Kimi K2.5

Qwen3.7 Plus

Qwen3 VL 235B A22B Instruct

Qwen3.6 Plus

Gemma 4 26B A4B

Price

USD per 1M output tokens · Lower is better · GPT-4o-mini (2024-07-18) ranks #32 of 155

$0.42

Qwen3 VL 32B Instruct

$0.50

Qwen3 VL 8B Instruct

$0.52

Qwen3 VL 30B A3B Instruct

$0.55

Mistral Small 3.1 24B

$0.60

Llama 4 Maverick

$0.60

Mistral Small 4

$0.60

GPT-4o-mini

$0.60

GPT-4o-mini

$0.60

Saba

$0.88

Qwen3 VL 235B A22B Instruct

$0.90

Codestral 2508

$0.90

GLM 4.6V

$0.97

Qwen3.6 35B A3B

Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).

Best for

Long Document Analysis with Visuals

Handles extended documents up to 128000 tokens that include images or files, enabling summarization and extraction of insights from reports containing charts or diagrams.

Code Generation and Review

Supports code generation, analysis, and debugging across languages while maintaining context over large codebases or multiple files.

Image Description and Reasoning

Delivers accurate visual reasoning and descriptions for images, supporting tasks like content analysis or accessibility features.

Strengths & limitations

Strengths

+Fast and cost-efficient responses
+Good balance of capability and speed
+Handles mixed text and image inputs effectively
+Suitable for high-volume or real-time use cases

Limitations

–Less depth on complex reasoning than larger models
–No audio or video modality support
–Can still hallucinate or miss nuances on edge cases

Cost calculator

Estimate what GPT-4o-mini (2024-07-18) would cost for your usage.

Input tokens / requestOutput tokens / requestRequests / month

$0.00045

per request

$4.5

estimated / month

Based on GPT-4o-mini (2024-07-18)'s $0.15/1M input · $0.60/1M output. Estimate only — actual cost varies by provider and caching.

Quick start

OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.

JavaScript · openai

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "openai/gpt-4o-mini-2024-07-18",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);

Model slug: openai/gpt-4o-mini-2024-07-18

Editor's verdict

Our take on GPT-4o-mini (2024-07-18)

GPT-4o-mini (2024-07-18) is OpenAI's proprietary multimodal with a 128K-token context window.

On independent testing it scores 12.6 on the Artificial Analysis Intelligence Index, running at roughly 55 tokens per second with about 1.23s to first token.

At $0.60 per 1M output tokens, it is very cost-efficient for its class.

It is available through OpenAI's API and aggregators like OpenRouter.

Best suited to fast and cost-efficient responses and good balance of capability and speed.

Did you find this helpful?

Frequently asked questions

The model provides a context window of 128000 tokens for processing extended inputs.

User reviews

Real, verified reviews from the community shape this model's rating.

Loading reviews…

Other GPT models

Sibling versions in the GPT family from OpenAI.

GPT-5.4

OpenAI · Multimodal

Verified

Multimodal model excelling at large-scale text, image and file tasks.

ClosedII 56.81050K ctx$15.00/1M out

GPT-5.3-Codex

OpenAI · Multimodal

Verified

Multimodal coding model with 400k-token context from OpenAI.

ClosedII 53.6400K ctx$14.00/1M out

GPT-5.5

OpenAI · Multimodal

Verified

OpenAI's multimodal model built for massive file, image, and text inputs.

ClosedII 50.81050K ctx$30.00/1M out

GPT-5.2-Codex

OpenAI · Multimodal

Verified

Multimodal model handling text and images at scale.

ClosedII 49400K ctx$14.00/1M out

GPT-5.4 Mini

OpenAI · Multimodal

Verified

Multimodal model for large-scale file, image, and text processing.

ClosedII 48.9400K ctx$4.50/1M out

GPT-5.2

OpenAI · Multimodal

Verified

OpenAI's multimodal model for large-scale file, image, and text tasks.

ClosedII 46.6400K ctx$14.00/1M out

Promote GPT-4o-mini (2024-07-18)

Add this badge to your website, or share the tool.

DFeatured on DhanasviGPT-4o-mini (2024-07-18) 1

GPT-4o-mini (2024-07-18)

About GPT-4o-mini (2024-07-18)

Capabilities

Benchmarks & performance

How GPT-4o-mini (2024-07-18) compares

Intelligence

Speed

Price

Best for

Long Document Analysis with Visuals

Code Generation and Review

Image Description and Reasoning

Strengths & limitations

Strengths

Limitations

Cost calculator

Quick start

Editor's verdict

Frequently asked questions

What is the context length supported by GPT-4o-mini (2024-07-18)?

Is GPT-4o-mini multimodal?

How is GPT-4o-mini accessed?

What types of tasks does GPT-4o-mini handle well?

User reviews

Other GPT models

GPT-5.4

GPT-5.3-Codex

GPT-5.5

GPT-5.2-Codex

GPT-5.4 Mini

GPT-5.2

Similar models

Claude Opus 4.6

GPT-4.1 Nano

GPT-4.1

Promote GPT-4o-mini (2024-07-18)