Does Mistral Medium 3 accept image inputs?

Yes, it offers multimodal understanding of text and images.

What is the pricing for using Mistral Medium 3?

Specific pricing details are not included in the model specifications.

How do I access Mistral Medium 3?

Access is available via Mistral's platform for registered users.

What capabilities make Mistral Medium 3 suitable for document work?

It supports file and document analysis along with long-context reasoning and vision-language tasks.

Mistral Medium 3 by Mistral — Specs, Pricing, Benchmarks (2026)

About Mistral Medium 3

Mistral Medium 3 uses a multimodal design that integrates text, visual, and file processing in a single model. Its large context window accommodates extended inputs such as lengthy documents paired with images. The open-weight availability allows researchers and developers to inspect, modify, and host the model locally.

Key strengths lie in handling mixed-modality queries without closed-source restrictions. It maintains coherence across long conversations or multi-page files that include visual elements. This combination makes it suitable for tasks requiring both language understanding and image interpretation.

Common applications include document summarization with charts, visual question answering, and automated analysis of mixed media collections. Teams deploy it in research prototypes or production pipelines where transparency and customization matter. Its architecture supports fine-tuning on domain-specific multimodal datasets.

Capabilities

Multimodal understanding (text + image)

Long-context reasoning

File and document analysis

Vision-language tasks

General text generation and instruction following

How Mistral Medium 3 compares

Mistral Medium 3 (striped bar) vs other multimodal on intelligence, speed and price.

Price

USD per 1M output tokens · Lower is better · Mistral Medium 3 ranks #61 of 139

$2.0

GPT-5 Mini

$2.0

GPT-5.1-Codex-Mini

$2.0

Devstral 2 2512

$2.0

Grok Build 0.1

$2.0

Seed-2.0-Lite

$2.0

Seed 1.6

$2.0

Mistral Medium 3

$2.0

Mistral Medium 3.1

$2.0

Kimi K2.5

$2.1

Qwen3.5-122B-A10B

$2.3

Qwen3.5 397B A17B

$2.5

Gemini 2.5 Flash

$2.5

Grok 4.3

Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).

Best for

Long document analysis with embedded images

The model processes files and documents up to 131072 tokens that contain both text and visuals, supporting detailed reasoning across the combined inputs.

Vision-language reasoning tasks

It performs multimodal understanding by interpreting images alongside text for tasks such as scene description and visual question answering within extended contexts.

Complex instruction following over large inputs

Users can apply the model for general text generation and instruction following that incorporates long textual sequences with image references.

Strengths & limitations

Strengths

+Native multimodal integration
+Large 128k context window
+Flexible input modalities including files

Limitations

–Medium-tier model may trail larger flagships on complex reasoning
–Performance depends on prompt quality for edge cases

Cost calculator

Estimate what Mistral Medium 3 would cost for your usage.

Input tokens / requestOutput tokens / requestRequests / month

$0.00140

per request

$14

estimated / month

Based on Mistral Medium 3's $0.40/1M input · $2.00/1M output. Estimate only — actual cost varies by provider and caching.

Quick start

OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.

JavaScript · openai

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "mistralai/mistral-medium-3",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);

Model slug: mistralai/mistral-medium-3

Editor's verdict

Our take on Mistral Medium 3

Mistral Medium 3 is Mistral's proprietary multimodal with a 131K-token context window.

At $2.00 per 1M output tokens, it is mid-priced for its class.

It is available through Mistral's API and aggregators like OpenRouter.

Best suited to native multimodal integration and large 128k context window.

Did you find this helpful?

Frequently asked questions

The model provides a context window of 131072 tokens.

User reviews

Real, verified reviews from the community shape this model's rating.

Other multimodal worth comparing.

Gemini 2.5 Flash Lite

Google · Multimodal

Verified

Google's fast, lightweight multimodal model for text, image, audio, and video tasks.

Closed1049K ctx$0.40/1M out

GPT-5.1

OpenAI · Multimodal

Verified

OpenAI's multimodal model for large-scale image, text, and file processing.

Closed400K ctx$10.00/1M out

Gemini 2.5 Pro Preview 05-06

Google · Multimodal

Verified

Google's multimodal model processes text, images, audio, video and files over 1M tokens.

Closed1049K ctx$10.00/1M out

Mistral Medium 3

About Mistral Medium 3

Capabilities

How Mistral Medium 3 compares

Price

Best for

Long document analysis with embedded images

Vision-language reasoning tasks

Complex instruction following over large inputs

Strengths & limitations

Strengths

Limitations

Cost calculator

Quick start

Editor's verdict

Frequently asked questions

What context length does Mistral Medium 3 support?

Does Mistral Medium 3 accept image inputs?

What is the pricing for using Mistral Medium 3?

How do I access Mistral Medium 3?

What capabilities make Mistral Medium 3 suitable for document work?

User reviews

Other Mistral models

Mistral Medium 3.5

Mistral Small 4

Mistral Large 3 2512

Mistral Medium 3.1

Mistral Small 3.2 24B

Mistral Small 3.1 24B

Similar models

Gemini 2.5 Flash Lite

GPT-5.1

Gemini 2.5 Pro Preview 05-06

Promote Mistral Medium 3