Inference & Endpoints

AI API Providers

Where to run AI models in production. Compare pricing, speed, free credits and supported models across inference providers.

Anthropic API

pay-per-token

Official API for the Claude model family, with prompt caching and strong long-context support.

$5 trial

450ms first token

us, eu

Claude Opus 4.5Claude Sonnet 4.5Claude Haiku

AWS Bedrock

pay-per-token

Amazon's managed service for accessing multiple foundation models with enterprise security, guardrails and agents.

AWS free tier

450ms first token

us, eu, asia

NovaClaudeLlama 4Mistral

Azure OpenAI

pay-per-token

Enterprise access to OpenAI models with Azure's compliance, networking and regional deployment options.

Azure free tier

450ms first token

us, eu, asia

GPT-5o-seriesDALL·Eembeddings

Fireworks AI

pay-per-token

Fast, production-grade inference for open models with FireAttention optimisation and fine-tuning.

$1 trial

200ms first token

Llama 4DeepSeekQwen3FLUX

Google Vertex AI

pay-per-token

Google Cloud's enterprise AI platform for Gemini and open models with grounding and MLOps tooling.

$300 GCP credit

420ms first token

us, eu, asia

Gemini 3GemmaImagenVeo

Groq

pay-per-token

LPU-based inference provider delivering extremely fast token throughput for open models.

Free tier

100ms first token

Llama 4Qwen3DeepSeekKimi

OpenAI API

pay-per-token

Direct access to OpenAI's GPT, reasoning, image and audio models via a mature, widely supported API.

None (paid)

400ms first token

us, eu

GPT-5GPT-5 minio-seriesDALL·E

OpenRouter

pay-per-token

A single API and marketplace that routes requests across hundreds of models and providers with automatic fallbacks.

Free models available

500ms first token

global

GPT-5ClaudeGeminiLlama 4

Replicate

pay-per-second

A platform to run and deploy thousands of open models — especially image, video and audio — via a simple API.

Free trial

600ms first token

FLUXLlama 4SDXLthousands of community models

Together AI

pay-per-token

Inference cloud for 200+ open models with fine-tuning and dedicated GPU endpoints.

$1 trial

250ms first token

Llama 4DeepSeekQwen3Mixtral