LLM Leaderboard
AI Models Leaderboard
Live benchmark comparison of frontier AI models — context window, pricing, speed, and capability scores side by side.
| # | Model | Provider | Arena | Context | GPQA | SWE | In/Out | Speed |
|---|---|---|---|---|---|---|---|---|
| 1 | Gemini 3 Pro | 1455 | 1,000K | 86% | 76% | $2/$12 | — | |
| 2 | GPT-5 | OpenAI | 1450 | 400K | 85% | 75% | $1.25/$10 | — |
| 3 | Claude Opus 4.5 | Anthropic | 1440 | 200K | 87% | 80% | $5/$25 | — |
| 4 | Grok 4 | xAI | 1420 | 256K | 84% | 72% | $3/$15 | — |
| 5 | Claude Sonnet 4.5 | Anthropic | 1415 | 200K | 83% | 77% | $3/$15 | — |
| 6 | DeepSeek V3.2 Open Source | DeepSeek | 1390 | 128K | 79% | 66% | $0.28/$0.42 | — |
| 7 | Qwen3 Max Open Source | Alibaba | 1375 | 256K | 78% | 64% | $0.4/$1.2 | — |
| 8 | Mistral Large 3 | Mistral AI | 1360 | 256K | 75% | 60% | $2/$6 | — |
| 9 | Llama 4 Maverick Open Source | Meta | 1340 | 1,000K | 70% | 55% | $0.2/$0.6 | — |
| 10 | Amazon Nova Pro | Amazon | 1320 | 300K | 68% | 50% | $0.8/$3.2 | — |
Prices shown per 1M tokens. Benchmarks: GPQA Diamond, SWE-Bench, Chatbot Arena Elo. Figures are sourced from public provider data and may change.