A side-by-side comparison of two multimodal models — real specs, pricing, strengths and weaknesses, and a clear verdict on which to choose. Kept current by our agents.
Gemini 3.1 Pro Preview leads with a higher intelligence index (46.5 vs 36.1), a 1M-token context window, and native support for audio/video alongside text and images, making it stronger for complex multimodal document tasks. GPT-5 Codex counters with faster output (169.49 t/s vs 128.71 t/s) and lower price ($10 vs $12 per 1M tokens) plus coding specialization. The preview status of Gemini introduces potential inconsistency risks not noted for GPT-5 Codex.
| Spec | Gemini 3.1 Pro Preview | GPT-5 Codex | Winner |
|---|---|---|---|
| Intelligence | 46.5 | 36.1 | Gemini 3.1 Pro Preview |
| Output speed | 129 t/s | 169 t/s | GPT-5 Codex |
| Output price | $12.00/1M | $10.00/1M | GPT-5 Codex |
| Context | 1049K | 400K | Gemini 3.1 Pro Preview |
| Params | — | — | Tie |
| Provider | OpenAI | Tie |
Gemini 3.1 Pro Preview scores 46.5 on the intelligence index compared to 36.1 for GPT-5 Codex. This gap favors Gemini for complex multimodal reasoning. Both models are proprietary with unspecified parameter counts.
GPT-5 Codex delivers 169.49 tokens per second versus 128.71 for Gemini 3.1 Pro Preview. The speed advantage holds for high-throughput text and image tasks. Both face high resource demands at maximum context lengths.
Gemini supports a 1,048,576-token context and native audio, image, video, and text inputs. GPT-5 Codex is limited to 400,000 tokens and text plus static images only. Gemini's larger window suits large-scale document analysis.
GPT-5 Codex costs $10 per million output tokens while Gemini 3.1 Pro Preview costs $12. Both are proprietary models from major providers. Price difference is modest but favors GPT-5 Codex on volume workloads.
Pros
Cons
Pros
Cons
Choose Gemini 3.1 Pro Preview for maximum context, full multimodal coverage, and higher intelligence scores. Select GPT-5 Codex when speed, lower price, and coding focus matter most within its modality limits. The models trade off breadth and depth against efficiency.
Gemini 3.1 Pro Preview is stronger for multimodal tasks due to native audio/video support, 1M context, and higher intelligence index, while GPT-5 Codex is limited to text and static images.