Skip to content
Lyria 3 Clip Preview logo

Lyria 3 Clip Preview

Verified

Google's multimodal preview model for generating audio clips from text and images.

GoogleAudio & MusicClosed
JSON modeVision
Model page
Updated 2026-06-14

About Lyria 3 Clip Preview

Designed as a multimodal system, Lyria 3 Clip Preview integrates text, image, and audio processing in a single large context window. Google developed it as a non-open-weight model, limiting direct access to the preview release. This architecture supports extended prompts that combine multiple input types for audio output generation.

Its primary strength lies in handling very long multimodal sequences without requiring model weights to be publicly available. The preview format allows users to test clip creation capabilities while Google retains full control over the underlying system. Typical applications include rapid prototyping of audio ideas where extended context helps maintain coherence across complex instructions.

Users commonly employ it for creative workflows that start with descriptive text or reference images to guide short audio segments. The model suits scenarios needing quick iteration on sound design before full production. Because parameters are not disclosed, evaluations rely on output quality rather than architectural specifics.

Capabilities

Text-to-audio generation
Image-conditioned audio synthesis
Long-context audio modeling
Multimodal prompt understanding
Music and sound creation
Audio continuation

Best for

Long-form Music Generation

With a 1048576-token context window, the model maintains coherence across extended audio sequences for full track previews or soundtrack segments.

Music Production Clip Previews

It supports rapid generation of audio clips from detailed prompts, fitting iterative workflows in composition and editing.

Complex Audio Prompt Handling

The large context enables processing of intricate instructions involving multiple musical elements or layered audio references.

Strengths & limitations

Strengths

  • +Strong multimodal audio generation from text and images
  • +Very long context support for extended sequences
  • +High-quality audio output from Google research

Limitations

  • Preview version with potential feature restrictions
  • Primarily audio-focused rather than general-purpose
  • May require careful prompting for complex outputs

Pricing by provider

Live per-provider pricing & uptime, routed via OpenRouter. Prices are USD per 1M tokens.

ProviderInput /1MOutput /1MContextUptime
Google AI StudioFreeFree1049K

Quick start

OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.

JavaScript · openai
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "google/lyria-3-clip-preview",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message.content);

Model slug: google/lyria-3-clip-preview

Editor's verdict

Our take on Lyria 3 Clip Preview

Lyria 3 Clip Preview is Google's proprietary audio & music with a 1049K-token context window.

At no token cost, it is free to run for its class, served by 1 provider.

It is available through Google's API and aggregators like OpenRouter.

Best suited to strong multimodal audio generation from text and images and very long context support for extended sequences.

Did you find this helpful?

Frequently asked questions

The model provides a context window of 1048576 tokens.

User reviews

Real, verified reviews from the community shape this model's rating.

Loading reviews…

Sign in to review

Other Lyria models

Sibling versions in the Lyria family from Google.

Promote Lyria 3 Clip Preview

Add this badge to your website, or share the tool.

DFeatured on DhanasviLyria 3 Clip Preview 2