Does GPT Audio perform audio transcription?

Yes, it includes built-in audio transcription and summarization as core capabilities.

How can developers access GPT Audio?

Access is available via OpenAI's standard API endpoints for audio-enabled models.

What conversation modes are supported?

It supports text-audio multimodal conversations with both input understanding and natural speech output.

Is pricing information published for GPT Audio?

Pricing follows OpenAI's standard audio model rates and is listed on the official API documentation page.

GPT Audio by OpenAI — Specs, Pricing, Benchmarks (2026)

About GPT Audio

GPT Audio combines text and audio capabilities in a single system developed by OpenAI. Its large context window supports extended conversations or audio sequences without losing earlier details. The design prioritizes integration of spoken content with written instructions.

Strengths include handling both modalities natively while remaining proprietary. This allows consistent performance across audio transcription, generation, and text-based reasoning. Users benefit from the model's ability to maintain coherence over long inputs.

Typical usage covers voice interfaces, audio analysis, and mixed media applications. Developers integrate it into tools requiring speech understanding alongside textual context. The closed nature means access occurs through OpenAI's platforms rather than local deployment.

Capabilities

Audio input understanding and analysis

Natural speech generation and synthesis

Text-audio multimodal conversation

Audio transcription and summarization

Long-context text reasoning with audio

Voice-based task assistance

Best for

Podcast Content Analysis

Upload extended podcast episodes for transcription, summarization, and insight extraction while leveraging the full 128k token context for comprehensive coverage.

Interactive Voice Assistants

Build natural speech generation flows for real-time voice-based task assistance such as scheduling or information retrieval in conversational apps.

Educational Audio Tutoring

Combine audio input understanding with text reasoning to deliver multimodal lessons that include speech synthesis and long-context explanations.

Strengths & limitations

Strengths

+High-quality, natural-sounding audio output
+Strong integration of audio and text understanding
+Large context window supporting extended interactions
+Low-latency conversational audio responses

Limitations

–No vision or image processing capabilities
–Performance depends on audio input clarity
–Audio-specific context handling more constrained than pure text

Cost calculator

Estimate what GPT Audio would cost for your usage.

Input tokens / requestOutput tokens / requestRequests / month

$0.00750

per request

$75

estimated / month

Based on GPT Audio's $2.50/1M input · $10.00/1M output. Estimate only — actual cost varies by provider and caching.

Quick start

OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.

JavaScript · openai

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "openai/gpt-audio",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);

Model slug: openai/gpt-audio

Editor's verdict

Our take on GPT Audio

GPT Audio is OpenAI's proprietary audio & music with a 128K-token context window.

At $10.00 per 1M output tokens, it is premium-priced for its class.

It is available through OpenAI's API and aggregators like OpenRouter.

Best suited to high-quality, natural-sounding audio output and strong integration of audio and text understanding.

Did you find this helpful?

Frequently asked questions

The model provides a 128000 token context window for handling extended audio and text inputs together.

User reviews

Real, verified reviews from the community shape this model's rating.

Loading reviews…

Sign in to review

Other GPT models

Sibling versions in the GPT family from OpenAI.

GPT-5.5

OpenAI · Multimodal

Verified

OpenAI's multimodal model built for massive file, image, and text inputs.

ClosedII 50.81050K ctx$30.00/1M out

GPT-5.4

OpenAI · Multimodal

Verified

Multimodal model excelling at large-scale text, image and file tasks.

Closed1050K ctx$15.00/1M out

GPT-5 Image Mini

OpenAI · Image Models

Verified

OpenAI's compact multimodal model for image and text tasks.

Closed400K ctx$2.00/1M out

GPT-5 Codex

OpenAI · Multimodal

Verified

OpenAI's multimodal model for large-scale text and image tasks.

Closed400K ctx$10.00/1M out

GPT-5.1-Codex-Mini

OpenAI · Multimodal

Verified

Multimodal coding model with 400k-token context from OpenAI.

Closed400K ctx$2.00/1M out

GPT-5.1-Codex

OpenAI · Multimodal

Verified

OpenAI's closed multimodal model for large-scale text and image tasks.

Closed400K ctx$10.00/1M out

Similar models

Other audio & music worth comparing.

Lyria 3 Pro Preview

Google · Audio & Music

Verified

Google's advanced preview model for multimodal audio generation and editing.

Closed1049K ctxFree

Lyria 3 Clip Preview

Google · Audio & Music

Verified

Google's multimodal preview model for generating audio clips from text and images.

Closed1049K ctxFree

GPT Audio Mini

OpenAI · Audio & Music

Verified

OpenAI's compact model for seamless text and audio processing.

Closed128K ctx$2.40/1M out

GPT Audio

About GPT Audio

Capabilities

Best for

Podcast Content Analysis

Interactive Voice Assistants

Educational Audio Tutoring

Strengths & limitations

Strengths

Limitations

Cost calculator

Quick start

Editor's verdict

Frequently asked questions

What context length does GPT Audio support?

Does GPT Audio perform audio transcription?

How can developers access GPT Audio?

What conversation modes are supported?

Is pricing information published for GPT Audio?

User reviews

Other GPT models

GPT-5.5

GPT-5.4

GPT-5 Image Mini

GPT-5 Codex

GPT-5.1-Codex-Mini

GPT-5.1-Codex

Similar models

Lyria 3 Pro Preview

Lyria 3 Clip Preview

GPT Audio Mini

Promote GPT Audio