ToolBench
VerifiedOpen-source framework for training LLMs to master thousands of real-world APIs.
What is ToolBench?
ToolBench is an open-source project that builds high-quality SFT datasets so language models can learn to call real APIs effectively. It gathers thousands of REST endpoints, generates single-tool and multi-tool instructions, and annotates complete solution paths that include reasoning, calls, and results.
Data creation relies on an enhanced ChatGPT instance together with a depth-first search decision tree that explores tool sequences efficiently. An API retriever component is included so models can discover relevant tools at inference time rather than depending on a fixed list.
The release targets researchers and developers who want reproducible tool-use capabilities in open models without relying solely on closed APIs or manual annotation.
Capabilities
What you can build with ToolBench
Fine-tuning for API calling
Train or adapt models on the released dataset to improve accuracy on both simple and chained tool invocations.
Benchmarking tool-use agents
Apply the included ToolEval scripts to measure how well different LLMs plan and execute API sequences.
Open-domain tool retrieval
Combine the provided retriever with ToolLLaMA to let models fetch and use APIs they have never seen during training.
Install ToolBench
git clone git@github.com:OpenBMB/ToolBench.git
cd ToolBench- 1Clone the ToolBench GitHub repository
- 2Download the latest data archive from the linked Google Drive folder
- 3Install dependencies listed in the project requirements
- 4Launch the RapidAPI backend service or use the hosted key after form approval
- 5Run the supplied fine-tuning or evaluation scripts with your chosen base model
Works with
ToolBench: pros & cons
Pros
- +Massive, automatically generated dataset with intact reasoning traces
- +Open-source model checkpoints that reduce API hallucination
- +Built-in support for both single-tool and multi-tool scenarios
- +Public evaluation framework covering multiple model families
Cons
- –Requires access to RapidAPI or a local simulation server
- –Training still demands substantial GPU resources
- –Periodic server IP updates needed for the hosted backend
Frequently asked questions
A LLaMA-based model fine-tuned on ToolBench data to improve tool-use performance and reduce hallucinations.
User reviews
Verified reviews from the community shape this listing's rating.
Loading reviews…