MixedVoices
VerifiedAnalytics and evaluation toolkit for voice-based AI agents.
What is MixedVoices?
MixedVoices is an open-source Python package that helps developers track and improve voice agent behavior through detailed call analysis and testing. It processes recordings to generate transcripts, success classifications, flow diagrams, and scores on metrics such as empathy, latency, and interruptions.
Users configure models for analytics and transcription, then add recordings or run simulated evaluations against custom agents or services like Bland AI. Results are stored per project and version so teams can compare iterations and identify patterns in conversation paths.
It is intended for teams building or maintaining production voice agents who need quantitative feedback before deployment and ongoing monitoring after launch.
Capabilities
What you can build with MixedVoices
Post-call review
Upload recordings to obtain transcripts, success labels, and metric scores that highlight strengths and weaknesses in live conversations.
Pre-deployment testing
Generate test cases from transcripts or descriptions, then run simulated dialogues to validate agent responses and metric performance.
Version comparison
Create multiple agent versions with different prompts and review side-by-side flow charts and metric trends to select the best iteration.
Install MixedVoices
pip install mixedvoicespip install mixedvoices- 1Run pip install mixedvoices to add the package.
- 2Execute mixedvoices config and supply the required API keys for OpenAI or Deepgram.
- 3Create a project and version with your agent prompt using the Python API.
- 4Add recordings or generate test cases, then run analysis or an evaluator.
- 5Open the dashboard to inspect flow charts, metrics, and simulation outcomes.
MixedVoices: pros & cons
Pros
- +Quick Python integration with both blocking and non-blocking recording analysis
- +Built-in metrics plus support for user-defined binary or continuous scores
- +Test case generation from multiple sources including existing transcripts
- +Support for evaluating both custom agents and third-party services like Bland AI
Cons
- –Analytics and transcription currently limited to OpenAI and Deepgram models
- –Requires separate implementation of the BaseAgent respond method for custom agents
- –Dashboard and analysis features are tied to a local or self-hosted setup
Frequently asked questions
It supports all OpenAI GPT models from gpt-3.5 onward by default, with transcription options including OpenAI Whisper and Deepgram Nova-2.
User reviews
Verified reviews from the community shape this listing's rating.
Loading reviews…