Skip to content
GPT Audio logo

GPT Audio

Verified

OpenAI's GPT Audio processes text and audio with a 128k token context.

OpenAIAudio & MusicClosed
Model page
Updated 2026-06-14

About GPT Audio

GPT Audio combines text and audio capabilities in a single system developed by OpenAI. Its large context window supports extended conversations or audio sequences without losing earlier details. The design prioritizes integration of spoken content with written instructions.

Strengths include handling both modalities natively while remaining proprietary. This allows consistent performance across audio transcription, generation, and text-based reasoning. Users benefit from the model's ability to maintain coherence over long inputs.

Typical usage covers voice interfaces, audio analysis, and mixed media applications. Developers integrate it into tools requiring speech understanding alongside textual context. The closed nature means access occurs through OpenAI's platforms rather than local deployment.

Capabilities

Audio input understanding and analysis
Natural speech generation and synthesis
Text-audio multimodal conversation
Audio transcription and summarization
Long-context text reasoning with audio
Voice-based task assistance

Best for

Podcast Content Analysis

Upload extended podcast episodes for transcription, summarization, and insight extraction while leveraging the full 128k token context for comprehensive coverage.

Interactive Voice Assistants

Build natural speech generation flows for real-time voice-based task assistance such as scheduling or information retrieval in conversational apps.

Educational Audio Tutoring

Combine audio input understanding with text reasoning to deliver multimodal lessons that include speech synthesis and long-context explanations.

Strengths & limitations

Strengths

  • +High-quality, natural-sounding audio output
  • +Strong integration of audio and text understanding
  • +Large context window supporting extended interactions
  • +Low-latency conversational audio responses

Limitations

  • No vision or image processing capabilities
  • Performance depends on audio input clarity
  • Audio-specific context handling more constrained than pure text

Cost calculator

Estimate what GPT Audio would cost for your usage.

$0.00750
per request
$75
estimated / month

Based on GPT Audio's $2.50/1M input · $10.00/1M output. Estimate only — actual cost varies by provider and caching.

Quick start

OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.

JavaScript · openai
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "openai/gpt-audio",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);

Model slug: openai/gpt-audio

Editor's verdict

Our take on GPT Audio

GPT Audio is OpenAI's proprietary audio & music with a 128K-token context window.

At $10.00 per 1M output tokens, it is premium-priced for its class.

It is available through OpenAI's API and aggregators like OpenRouter.

Best suited to high-quality, natural-sounding audio output and strong integration of audio and text understanding.

Did you find this helpful?

Frequently asked questions

The model provides a 128000 token context window for handling extended audio and text inputs together.

User reviews

Real, verified reviews from the community shape this model's rating.

Loading reviews…

Sign in to review

Other GPT models

Sibling versions in the GPT family from OpenAI.

Promote GPT Audio

Add this badge to your website, or share the tool.

DFeatured on DhanasviGPT Audio 1