Inference & Endpoints

AI API Providers

Where to run AI models in production. Compare pricing, speed, free credits and supported models across inference providers.

Anthropic API

Anthropic API

pay-per-token

Official API for the Claude model family, with prompt caching and strong long-context support.

$5 trial
450ms first token
us, eu
Claude Opus 4.5Claude Sonnet 4.5Claude Haiku
AWS Bedrock

AWS Bedrock

pay-per-token

Amazon's managed service for accessing multiple foundation models with enterprise security, guardrails and agents.

AWS free tier
450ms first token
us, eu, asia
NovaClaudeLlama 4Mistral
Azure OpenAI

Azure OpenAI

pay-per-token

Enterprise access to OpenAI models with Azure's compliance, networking and regional deployment options.

Azure free tier
450ms first token
us, eu, asia
GPT-5o-seriesDALL·Eembeddings
Fireworks AI

Fireworks AI

pay-per-token

Fast, production-grade inference for open models with FireAttention optimisation and fine-tuning.

$1 trial
200ms first token
us
Llama 4DeepSeekQwen3FLUX
Google Vertex AI

Google Vertex AI

pay-per-token

Google Cloud's enterprise AI platform for Gemini and open models with grounding and MLOps tooling.

$300 GCP credit
420ms first token
us, eu, asia
Gemini 3GemmaImagenVeo
Groq

Groq

pay-per-token

LPU-based inference provider delivering extremely fast token throughput for open models.

Free tier
100ms first token
us
Llama 4Qwen3DeepSeekKimi
OpenAI API

OpenAI API

pay-per-token

Direct access to OpenAI's GPT, reasoning, image and audio models via a mature, widely supported API.

None (paid)
400ms first token
us, eu
GPT-5GPT-5 minio-seriesDALL·E
OpenRouter

OpenRouter

pay-per-token

A single API and marketplace that routes requests across hundreds of models and providers with automatic fallbacks.

Free models available
500ms first token
global
GPT-5ClaudeGeminiLlama 4
Replicate

Replicate

pay-per-second

A platform to run and deploy thousands of open models — especially image, video and audio — via a simple API.

Free trial
600ms first token
us
FLUXLlama 4SDXLthousands of community models
Together AI

Together AI

pay-per-token

Inference cloud for 200+ open models with fine-tuning and dedicated GPU endpoints.

$1 trial
250ms first token
us
Llama 4DeepSeekQwen3Mixtral