Multimodal

Models that understand text plus images, audio, or video.

32 models

Popular Cheapest Largest context Newest

Claude Opus 4.8

Anthropic · Multimodal

Verified

Multimodal reasoning over million-token contexts.

Closed1000K ctx$25.00/1M out

Gemini 3.5 Flash

Google · Multimodal

Verified

Google's fast multimodal model for text, image, video and audio tasks.

Closed1049K ctx$9.00/1M out

Gemini 3.1 Flash Lite

Google · Multimodal

Verified

Google's fast multimodal model for efficient text, image, and video tasks.

Closed1049K ctx$1.50/1M out

Claude Fable 5

Anthropic · Multimodal

Verified

Multimodal model with a million-token context for complex inputs.

Closed1000K ctx$50.00/1M out

Claude Opus 4.7 (Fast)

Anthropic · Multimodal

Verified

Fast multimodal model handling massive text, image, and file inputs.

Closed1000K ctx$150.00/1M out

Claude Opus 4.8 (Fast)

Anthropic · Multimodal

Verified

Fast multimodal model with a 1M-token context window from Anthropic.

Closed1000K ctx$50.00/1M out

GPT Chat Latest

OpenAI · Multimodal

Verified

OpenAI's multimodal model for large-scale text, image and file tasks.

Closed400K ctx$30.00/1M out

GPT-5.5 Pro

OpenAI · Multimodal

Verified

Multimodal model handling over a million tokens of context.

Closed1050K ctx$180.00/1M out

GPT-5.5

OpenAI · Multimodal

Verified

OpenAI's multimodal model built for massive file, image, and text inputs.

Closed1050K ctx$30.00/1M out

Grok 4.3

xAI · Multimodal

Verified

Multimodal model with 1M-token context for complex text and image tasks.

Closed1000K ctx$2.50/1M out

MiMo-V2.5

Xiaomi · Multimodal

Verified

MiMo-V2.5 processes extended multimodal sequences across text, audio, image, and video.

Closed1049K ctx$0.28/1M out

Claude Fable Latest

Anthropic · Multimodal

Verified

Anthropic's closed multimodal model with 1M-token context.

Closed1000K ctx$50.00/1M out

OpenAI GPT Mini Latest

OpenAI · Multimodal

Verified

Multimodal model for large-scale file, image, and text tasks.

Closed400K ctx$4.50/1M out

Google Gemini Flash Latest

Google · Multimodal

Verified

Google's fast multimodal model for efficient text, image, video and audio tasks.

Closed1049K ctx$9.00/1M out

Qwen3.5 Plus 2026-04-20

Alibaba Qwen · Multimodal

Verified

Open-weight multimodal model for long-context text, image, and video tasks.

Open1000K ctx$1.80/1M out

Google Gemini Pro Latest

Google · Multimodal

Verified

Google's multimodal model for long-context reasoning across media types.

Closed1049K ctx$12.00/1M out

OpenAI GPT Latest

OpenAI · Multimodal

Verified

Multimodal model for massive text, image, and file inputs.

Closed1050K ctx$30.00/1M out

Anthropic Claude Sonnet Latest

Anthropic · Multimodal

Verified

Multimodal reasoning and long-context analysis from Anthropic.

Closed1000K ctx$15.00/1M out

Claude Opus Latest

Anthropic · Multimodal

Verified

Anthropic's multimodal model for large-scale text and image analysis.

Closed1000K ctx$25.00/1M out

Qwen3.6 Flash

Alibaba Qwen · Multimodal

Verified

Qwen3.6 Flash processes million-token multimodal inputs across text, image and video.

Open1000K ctx$1.13/1M out

MiniMax M3

MiniMax · Multimodal

Verified

Processes long multimodal sequences across text, images, and video.

Closed1049K ctx$1.20/1M out

Qwen3.7 Plus

Alibaba Qwen · Multimodal

Verified

Open-weight multimodal model for million-token text and image tasks.

Open1000K ctx$1.28/1M out

Mistral Medium 3.5

Mistral · Multimodal

Verified

Mistral's closed multimodal model for long-context text, image, and file tasks.

Closed262K ctx$7.50/1M out

Grok Build 0.1

xAI · Multimodal

Verified

Multimodal AI from xAI for text and image tasks with large context.

Closed256K ctx$2.00/1M out

Kimi K2.7 Code

Moonshot AI · Multimodal

Verified

Multimodal model specialized in code tasks with extensive context.

Closed262K ctx$3.50/1M out

Qwen3.6 35B A3B

Alibaba Qwen · Multimodal

Verified

Multimodal model for long-context text, image, and video analysis.

Open262K ctx$1.00/1M out

MoonshotAI Kimi Latest

Moonshot AI · Multimodal

Verified

Excels at long-context multimodal text and image tasks.

Closed262K ctx$3.41/1M out

Kimi K2.6

Moonshot AI · Multimodal

Verified

Kimi K2.6 processes long text and image inputs with a 262k-token context.

Closed262K ctx$3.41/1M out

Qwen3.6 27B

Alibaba Qwen · Multimodal

Verified

Multimodal model for long-context text, image, and video processing.

Open262K ctx$3.17/1M out

Step 3.7 Flash

Stepfun · Multimodal

Verified

Multimodal model for long-context text, image, and video tasks.

Closed256K ctx$1.15/1M out

Anthropic Claude Haiku Latest

Anthropic · Multimodal

Verified

Anthropic's fast multimodal model for efficient text and image processing.

Closed200K ctx$5.00/1M out

Perceptron Mk1

Perceptron · Multimodal

Verified

Closed-source multimodal model handling text, image, and video inputs.

Closed33K ctx$1.50/1M out