Which models support the largest context window?

Gemini 3.1 Flash Lite, Gemini 3.1 Flash Lite Preview, and Gemini 2.5 Flash Lite each support a 1048576-token context window.

What is the lowest priced model per million output tokens?

gpt-oss-120b is the lowest priced at $0.18 per million tokens.

Fastest AI Models

This ranked list presents the fastest AI models according to measured output speeds in tokens per second. Models differ in intelligence index, context window sizes, pricing per million tokens, and supported modalities. Selection should account for trade-offs between raw speed, context capacity, and whether text-only or multimodal inputs are required.

Mercury 2

LLM · $0.75/1M

View

Mercury 2 earns the top position through its leading output speed of 839.31 t/s while supporting a 128000-token context window for text workflows.

Intelligence: 32.8Output speed: 839 t/sOutput price: $0.75/1MContext: 128K

Step 3.7 Flash

Multimodal · $1.15/1M

View

Step 3.7 Flash ranks second with an output speed of 380.3 t/s, a 256000-token context, and native multimodal support for text, image, and video.

Intelligence: 42.6Output speed: 380 t/sOutput price: $1.15/1MContext: 256K

gpt-oss-120b

LLM · $0.18/1M

View

gpt-oss-120b places third due to its 344.97 t/s output speed and 131072-token context handling for long-form text at a low price of $0.18 per million tokens.

Intelligence: 33.3Output speed: 345 t/sOutput price: $0.18/1MContext: 131K

Gemini 3.1 Flash Lite Preview

Multimodal · $1.50/1M

View

Gemini 3.1 Flash Lite Preview earns its ranking via matching 310.24 t/s speed and 1048576-token context with broad native multimodal capabilities.

Intelligence: 33.5Output speed: 310 t/sOutput price: $1.50/1MContext: 1049K

Gemini 3.1 Flash Lite

Multimodal · $1.50/1M

View

Gemini 3.1 Flash Lite secures its spot with 310.24 t/s speed, a 1048576-token context, and efficient multimodal support for text, image, and video.

Intelligence: 33.5Output speed: 310 t/sOutput price: $1.50/1MContext: 1049K

Gemini 2.5 Flash Lite

Multimodal · $0.40/1M

View

Gemini 2.5 Flash Lite ranks here through 276.7 t/s output speed, a 1048576-token context, and multimodal handling of text, image, audio, and video at $0.4 per million tokens.

Intelligence: 17.6Output speed: 277 t/sOutput price: $0.40/1MContext: 1049K

MiniMax M2.5

LLM · $0.90/1M

View

MiniMax M2.5 earns its position with 234.2 t/s speed and a 204800-token context window suited for extended text processing.

Intelligence: 41.9Output speed: 234 t/sOutput price: $0.90/1MContext: 205K

MiniMax M2.1

LLM · $0.95/1M

View

MiniMax M2.1 places eighth due to its 233.39 t/s output speed and 204800-token context for long text sequences.

Intelligence: 39.4Output speed: 233 t/sOutput price: $0.95/1MContext: 205K

o3 Mini

Multimodal · $4.40/1M

View

o3 Mini secures its spot with 230.92 t/s speed, a 200000-token context, and efficient reasoning for text and file tasks.

Intelligence: 25.9Output speed: 231 t/sOutput price: $4.40/1MContext: 200K

o3 Mini High

Multimodal · $4.40/1M

View

o3 Mini High ranks tenth through its 226.72 t/s output speed and 200000-token context with strong STEM performance for text and file reasoning.

Intelligence: 25.2Output speed: 227 t/sOutput price: $4.40/1MContext: 200K

gpt-oss-20b

LLM · $0.14/1M

View

OpenAI's gpt-oss-20b handles long-context text tasks with precision.

Intelligence: 24.5Output speed: 218 t/sOutput price: $0.14/1MContext: 131K

GPT-5.1-Codex-Mini

Multimodal · $2.00/1M

View

Multimodal coding model with 400k-token context from OpenAI.

Intelligence: 38.6Output speed: 215 t/sOutput price: $2.00/1MContext: 400K

How we ranked this list

Ranked by fastest measured output speed (tokens/sec). Data is pulled from live sources and refreshed continuously by Dhanasvi's autonomous agents — so this ranking stays current as new options launch and prices change.

Frequently asked questions

Mercury 2 offers the highest output speed at 839.31 t/s.

How we ranked this list

Frequently asked questions

Which model offers the highest output speed?

Which models support the largest context window?

What is the lowest priced model per million output tokens?