Skip to content

Fastest AI Models

This ranked list presents the fastest AI models according to measured output speeds in tokens per second. Models differ in intelligence index, context window sizes, pricing per million tokens, and supported modalities. Selection should account for trade-offs between raw speed, context capacity, and whether text-only or multimodal inputs are required.

1
Mercury 2

LLM · $0.75/1M

View

Mercury 2 earns the top position through its leading output speed of 839.31 t/s while supporting a 128000-token context window for text workflows.

Intelligence: 32.8Output speed: 839 t/sOutput price: $0.75/1MContext: 128K
2Step 3.7 Flash logo
Step 3.7 Flash

Multimodal · $1.15/1M

View

Step 3.7 Flash ranks second with an output speed of 380.3 t/s, a 256000-token context, and native multimodal support for text, image, and video.

Intelligence: 42.6Output speed: 380 t/sOutput price: $1.15/1MContext: 256K
3gpt-oss-120b logo
gpt-oss-120b

LLM · $0.18/1M

View

gpt-oss-120b places third due to its 344.97 t/s output speed and 131072-token context handling for long-form text at a low price of $0.18 per million tokens.

Intelligence: 33.3Output speed: 345 t/sOutput price: $0.18/1MContext: 131K
4Gemini 3.1 Flash Lite Preview logo
Gemini 3.1 Flash Lite Preview

Multimodal · $1.50/1M

View

Gemini 3.1 Flash Lite Preview earns its ranking via matching 310.24 t/s speed and 1048576-token context with broad native multimodal capabilities.

Intelligence: 33.5Output speed: 310 t/sOutput price: $1.50/1MContext: 1049K
5Gemini 3.1 Flash Lite logo
Gemini 3.1 Flash Lite

Multimodal · $1.50/1M

View

Gemini 3.1 Flash Lite secures its spot with 310.24 t/s speed, a 1048576-token context, and efficient multimodal support for text, image, and video.

Intelligence: 33.5Output speed: 310 t/sOutput price: $1.50/1MContext: 1049K
6Gemini 2.5 Flash Lite logo
Gemini 2.5 Flash Lite

Multimodal · $0.40/1M

View

Gemini 2.5 Flash Lite ranks here through 276.7 t/s output speed, a 1048576-token context, and multimodal handling of text, image, audio, and video at $0.4 per million tokens.

Intelligence: 17.6Output speed: 277 t/sOutput price: $0.40/1MContext: 1049K
7MiniMax M2.5 logo
MiniMax M2.5

LLM · $0.90/1M

View

MiniMax M2.5 earns its position with 234.2 t/s speed and a 204800-token context window suited for extended text processing.

Intelligence: 41.9Output speed: 234 t/sOutput price: $0.90/1MContext: 205K
8MiniMax M2.1 logo
MiniMax M2.1

LLM · $0.95/1M

View

MiniMax M2.1 places eighth due to its 233.39 t/s output speed and 204800-token context for long text sequences.

Intelligence: 39.4Output speed: 233 t/sOutput price: $0.95/1MContext: 205K
9o3 Mini logo
o3 Mini

Multimodal · $4.40/1M

View

o3 Mini secures its spot with 230.92 t/s speed, a 200000-token context, and efficient reasoning for text and file tasks.

Intelligence: 25.9Output speed: 231 t/sOutput price: $4.40/1MContext: 200K
10o3 Mini High logo
o3 Mini High

Multimodal · $4.40/1M

View

o3 Mini High ranks tenth through its 226.72 t/s output speed and 200000-token context with strong STEM performance for text and file reasoning.

Intelligence: 25.2Output speed: 227 t/sOutput price: $4.40/1MContext: 200K
11gpt-oss-20b logo
gpt-oss-20b

LLM · $0.14/1M

View

OpenAI's gpt-oss-20b handles long-context text tasks with precision.

Intelligence: 24.5Output speed: 218 t/sOutput price: $0.14/1MContext: 131K
12GPT-5.1-Codex-Mini logo
GPT-5.1-Codex-Mini

Multimodal · $2.00/1M

View

Multimodal coding model with 400k-token context from OpenAI.

Intelligence: 38.6Output speed: 215 t/sOutput price: $2.00/1MContext: 400K

How we ranked this list

Ranked by fastest measured output speed (tokens/sec). Data is pulled from live sources and refreshed continuously by Dhanasvi's autonomous agents — so this ranking stays current as new options launch and prices change.

Frequently asked questions

Mercury 2 offers the highest output speed at 839.31 t/s.