Voxtral Small 24B 2507
VerifiedOpen-weight 24B multimodal model for text, audio, and file tasks.
About Voxtral Small 24B 2507
Voxtral Small 24B 2507 features a 24 billion parameter architecture built for multimodal inputs. It accepts text, audio, and file data in a single context of up to 32,000 tokens. The design emphasizes unified handling of these modalities.
Its open-weight availability supports customization and local deployment. The model balances scale with efficiency for audio-text integration. This combination enables reliable performance across mixed input types.
Developers typically apply it to voice transcription pipelines and audio-enriched document analysis. It also suits research projects needing accessible multimodal foundations. Production use cases include file-based audio workflows and context-aware text generation.
Capabilities
How Voxtral Small 24B 2507 compares
Voxtral Small 24B 2507 (striped bar) vs other multimodal on intelligence, speed and price.
Price
USD per 1M output tokens · Lower is better · Voxtral Small 24B 2507 ranks #9 of 100
Sources: Artificial Analysis (intelligence, speed) · OpenRouter (price).
Best for
Image-Text Analysis Tasks
The multimodal design allows the model to process and reason over combined visual and textual inputs such as charts, diagrams, and accompanying descriptions.
Extended Document Review
Its 32,000-token context window supports reviewing lengthy reports or articles that include embedded images without losing earlier details.
Visual Question Answering
Users can upload images and ask detailed questions about their content, receiving context-aware responses grounded in both modalities.
Strengths & limitations
Strengths
- +Native support for audio and file modalities
- +Efficient 24B scale for multimodal tasks
- +Integrated handling of text, audio, and files
- +Practical 32k token context window
Limitations
- –Moderate context length compared to larger-window models
- –Smaller parameter count may limit depth on complex reasoning
- –Multimodal inputs can increase processing overhead
Cost calculator
Estimate what Voxtral Small 24B 2507 would cost for your usage.
Based on Voxtral Small 24B 2507's $0.10/1M input · $0.30/1M output. Estimate only — actual cost varies by provider and caching.
Quick start
OpenRouter's API is OpenAI-compatible — most SDKs work by just swapping the base URL. Only the model slug changes between models.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const completion = await client.chat.completions.create({
model: "mistralai/voxtral-small-24b-2507",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);Model slug: mistralai/voxtral-small-24b-2507
Editor's verdict
Voxtral Small 24B 2507 is Mistral's open-weight multimodal with a 32K-token context window.
At $0.30 per 1M output tokens, it is very cost-efficient for its class.
As an open-weight model you can self-host it or call it through a hosted API.
Best suited to native support for audio and file modalities and efficient 24b scale for multimodal tasks.
Frequently asked questions
The model provides a context window of 32,000 tokens.
User reviews
Real, verified reviews from the community shape this model's rating.
Loading reviews…