Voice Lab
VerifiedTest and refine LLM voice agents using custom metrics and model comparisons.
What is Voice Lab?
Voice Lab provides a testing environment for LLM agents that powers voice applications. Users define metrics in JSON, run evaluations with an LLM-as-judge approach, and compare results across different language models and prompt versions.
The framework handles test scenario creation, execution of conversations, and generation of comparison tables. It currently covers only the underlying language model and prompt logic rather than full audio pipelines.
Developers building or maintaining voice agents benefit from systematic testing that reduces manual log review and supports cost optimization when switching models.
What you can build with Voice Lab
Model Migration
Compare performance and cost when moving between models such as Claude Sonnet and GPT-4 variants to find the best balance.
Prompt Iteration
Test multiple prompt variations against defined metrics to identify which versions improve agent behavior.
Edge Case Validation
Simulate interactions with different user personas to verify how the agent handles diverse conversation styles.
Install Voice Lab
git clone https://github.com/saharmor/voice-lab.gitgit clone https://github.com/saharmor/voice-lab.git
cd voice-lab- 1Clone the repository with git clone https://github.com/saharmor/voice-lab.git and enter the directory.
- 2Create a Python virtual environment and install dependencies from requirements.txt.
- 3Add your OpenAI API key to a .env file in the project root.
- 4Run the example test script with python llm_testing/example_test.py.
- 5Edit test_details.json or use the configuration editor to add new scenarios and metrics.
Voice Lab: pros & cons
Pros
- +Allows definition of custom evaluation metrics scored automatically
- +Generates comparison tables to support model and prompt decisions
- +Includes a UI editor for creating test configurations without manual JSON work
- +Open source and focused on practical agent evaluation needs
Cons
- –Only tests the text-based LLM component, not full voice audio handling
- –Requires an OpenAI key and currently limited to that provider
- –Setup involves multiple manual steps including environment configuration
Frequently asked questions
It evaluates the language model responses and prompt behavior of agents using user-defined metrics and LLM-based judging.
User reviews
Verified reviews from the community shape this listing's rating.
Loading reviews…