Search tools…/Sign in Explore

Multimodal

Models that understand text plus images, audio, or video.

27 models

Popular Top intelligence Cheapest Largest context Newest

GPT-5.5 Pro

OpenAI · Multimodal

Multimodal model handling over a million tokens of context.

Closed1050K ctx$180.00/1M out

Gemini 3.1 Flash Lite

Google · Multimodal

Google's fast multimodal model for efficient text, image, and video tasks.

ClosedII 33.51049K ctx$1.50/1M out

Gemini 3.5 Flash

Google · Multimodal

Google's fast multimodal model for text, image, video and audio tasks.

ClosedII 54.81049K ctx$9.00/1M out

Claude Fable 5

Anthropic · Multimodal

Multimodal model with a million-token context for complex inputs.

ClosedII 64.91000K ctx$50.00/1M out

GPT Chat Latest

OpenAI · Multimodal

OpenAI's multimodal model for large-scale text, image and file tasks.

Closed400K ctx$30.00/1M out

Grok 4.3

xAI · Multimodal

Multimodal model with 1M-token context for complex text and image tasks.

ClosedII 43.91000K ctx$2.50/1M out

GPT-5.5

OpenAI · Multimodal

OpenAI's multimodal model built for massive file, image, and text inputs.

ClosedII 50.81050K ctx$30.00/1M out

Claude Opus 4.7 (Fast)

Anthropic · Multimodal

Fast multimodal model handling massive text, image, and file inputs.

ClosedII 57.31000K ctx$150.00/1M out

Claude Opus 4.8 (Fast)

Anthropic · Multimodal

Fast multimodal model with a 1M-token context window from Anthropic.

ClosedII 61.41000K ctx$50.00/1M out

Claude Opus 4.8

Anthropic · Multimodal

Multimodal reasoning over million-token contexts.

ClosedII 61.41000K ctx$25.00/1M out

Google Gemini Flash Latest

Google · Multimodal

Google's fast multimodal model for efficient text, image, video and audio tasks.

Closed1049K ctx$9.00/1M out

MiniMax M3

MiniMax · Multimodal

Processes long multimodal sequences across text, images, and video.

ClosedII 54.71049K ctx$1.20/1M out

OpenAI GPT Mini Latest

OpenAI · Multimodal

Multimodal model for large-scale file, image, and text tasks.

Closed400K ctx$4.50/1M out

Anthropic Claude Sonnet Latest

Anthropic · Multimodal

Multimodal reasoning and long-context analysis from Anthropic.

Closed1000K ctx$15.00/1M out

Google Gemini Pro Latest

Google · Multimodal

Google's multimodal model for long-context reasoning across media types.

Closed1049K ctx$12.00/1M out

Claude Fable Latest

Anthropic · Multimodal

Anthropic's closed multimodal model with 1M-token context.

Closed1000K ctx$50.00/1M out

Claude Opus Latest

Anthropic · Multimodal

Anthropic's multimodal model for large-scale text and image analysis.

Closed1000K ctx$25.00/1M out

OpenAI GPT Latest

OpenAI · Multimodal

Multimodal model for massive text, image, and file inputs.

Closed1050K ctx$30.00/1M out

MiMo-V2.5

Xiaomi · Multimodal

MiMo-V2.5 processes extended multimodal sequences across text, audio, image, and video.

ClosedII 491049K ctx$0.28/1M out

Mistral Medium 3.5

Mistral · Multimodal

Mistral's closed multimodal model for long-context text, image, and file tasks.

ClosedII 39.2262K ctx$7.50/1M out

Grok Build 0.1

xAI · Multimodal

Multimodal AI from xAI for text and image tasks with large context.

Closed256K ctx$2.00/1M out

MoonshotAI Kimi Latest

Moonshot AI · Multimodal

Excels at long-context multimodal text and image tasks.

Closed262K ctx$3.41/1M out

Kimi K2.7 Code

Moonshot AI · Multimodal

Multimodal model specialized in code tasks with extensive context.

Closed262K ctx$3.50/1M out

Kimi K2.6

Moonshot AI · Multimodal

Kimi K2.6 processes long text and image inputs with a 262k-token context.

ClosedII 42.9262K ctx$3.40/1M out

Step 3.7 Flash

Stepfun · Multimodal

Multimodal model for long-context text, image, and video tasks.

ClosedII 42.6256K ctx$1.15/1M out

Anthropic Claude Haiku Latest

Anthropic · Multimodal

Anthropic's fast multimodal model for efficient text and image processing.

Closed200K ctx$5.00/1M out

Perceptron Mk1

Perceptron · Multimodal

Closed-source multimodal model handling text, image, and video inputs.

Closed33K ctx$1.50/1M out