Can MiniMax-01 process images along with text?

Yes, it is a multimodal model capable of text-image processing, visual content understanding, and image-text alignment.

Who created the MiniMax-01 model?

MiniMax-01 was developed by MiniMax.

What access options exist for using MiniMax-01?

The model is offered by MiniMax for applications involving long-context reasoning and extended document analysis.

MiniMax-01

Verified

Processes over one million tokens of text and images in a single context.

MiniMaxMultimodalClosed

Vision

Model page

Updated 2026-06-15

About MiniMax-01

MiniMax-01 is built around a multimodal architecture that jointly processes text and images. Its context window of 1000192 tokens allows entire documents or image collections to remain in view during inference. The design prioritizes coherence across extended multimodal sequences without requiring external retrieval systems.

Because the weights are not publicly released, MiniMax retains full control over training data, safety filters, and deployment. This closed approach supports consistent performance on tasks that combine visual understanding with long-range textual reasoning. Typical uses include analyzing lengthy illustrated reports, generating stories from image sequences, and maintaining context across multi-turn visual conversations.

Capabilities

Long-context reasoning

Multimodal text-image processing

Visual content understanding

Extended document analysis

Image-text alignment

Large-scale context retention

How MiniMax-01 compares

MiniMax-01 (striped bar) vs other multimodal on intelligence, speed and price.

Price

USD per 1M output tokens · Lower is better · MiniMax-01 ranks #42 of 155

$0.90

Codestral 2508

$0.90

GLM 4.6V

$0.97

Qwen3.6 35B A3B

$1.0

Qwen3.5-35B-A3B

$1.0

Qwen2.5 VL 72B Instruct

$1.0

Sonar

$1.1

MiniMax-01

$1.1

Qwen3.6 Flash

$1.1

Step 3.7 Flash

$1.2

MiniMax M3

$1.3

GPT-5.4 Nano

$1.3

Claude 3 Haiku

$1.3

ERNIE 4.5 VL 424B A47B

Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).

Best for

Extended multimodal document review

MiniMax-01 processes combined text and images across its full context length, supporting detailed analysis of lengthy reports or presentations.

Long-sequence visual reasoning

The model maintains image-text alignment while handling over one million tokens, enabling coherent interpretation of visual narratives spread across many pages.

Large-scale context retention projects

It performs long-context reasoning on massive multimodal inputs, making it suitable for tasks that require tracking details throughout extensive datasets or archives.

Strengths & limitations

Strengths

+Exceptional handling of lengthy inputs
+Seamless integration of vision and language
+Strong coherence across extended contexts
+Versatile for complex multimodal tasks

Limitations

–Limited to text and static image modalities
–High computational demands at maximum context
–Slower inference with very long inputs

Cost calculator

Estimate what MiniMax-01 would cost for your usage.

Input tokens / requestOutput tokens / requestRequests / month

$0.00075

per request

$7.5

estimated / month

Based on MiniMax-01's $0.20/1M input · $1.10/1M output. Estimate only — actual cost varies by provider and caching.

Quick start

OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.

JavaScript · openai

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "minimax/minimax-01",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);

Model slug: minimax/minimax-01

Editor's verdict

Our take on MiniMax-01

MiniMax-01 is MiniMax's proprietary multimodal with a 1000K-token context window.

At $1.10 per 1M output tokens, it is mid-priced for its class.

It is available through MiniMax's API and aggregators like OpenRouter.

Best suited to exceptional handling of lengthy inputs and seamless integration of vision and language.

Did you find this helpful?

Frequently asked questions

MiniMax-01 provides a context window of 1000192 tokens.

User reviews

Real, verified reviews from the community shape this model's rating.

Loading reviews…

Other MiniMax models

Sibling versions in the MiniMax family from MiniMax.

MiniMax M3

MiniMax · Multimodal

Verified

Processes long multimodal sequences across text, images, and video.

ClosedII 54.71049K ctx$1.20/1M out

MiniMax M2.7

MiniMax · Language Models

Verified

MiniMax M2.7 handles massive text contexts up to 204800 tokens.

ClosedII 49.6205K ctx$1.00/1M out

MiniMax M2.5

MiniMax · Language Models

Verified

MiniMax M2.5 processes up to 204800 tokens for extended text tasks.

ClosedII 41.9205K ctx$0.90/1M out

MiniMax M2.1

MiniMax · Language Models

Verified

MiniMax M2.1 handles massive contexts as a closed-source text LLM.

ClosedII 39.4205K ctx$0.95/1M out

MiniMax M2

MiniMax · Language Models

Verified

MiniMax M2 processes up to 204800 tokens of text in a single context.

ClosedII 36.1205K ctx$1.00/1M out

MiniMax M1

MiniMax · Language Models

Verified

Processes million-token contexts for deep text analysis.

Closed1000K ctx$2.20/1M out

Promote MiniMax-01

Add this badge to your website, or share the tool.

DFeatured on DhanasviMiniMax-01 2

MiniMax-01

About MiniMax-01

Capabilities