How was the dataset created?

ChatGPT with function-calling support generated instructions and answers; a depth-first search tree guided annotation of complex cases.

Can I run everything locally?

Yes, a StableToolBench variant provides a simulated local server so no external API keys are required.

ToolBench

Verified

Open-source framework for training LLMs to master thousands of real-world APIs.

Autonomous AgentsAgent Frameworks 5.7kOpen source

View on GitHub

Updated 2026-06-16

What is ToolBench?

ToolBench is an open-source project that builds high-quality SFT datasets so language models can learn to call real APIs effectively. It gathers thousands of REST endpoints, generates single-tool and multi-tool instructions, and annotates complete solution paths that include reasoning, calls, and results.

Data creation relies on an enhanced ChatGPT instance together with a depth-first search decision tree that explores tool sequences efficiently. An API retriever component is included so models can discover relevant tools at inference time rather than depending on a fixed list.

The release targets researchers and developers who want reproducible tool-use capabilities in open models without relying solely on closed APIs or manual annotation.

Capabilities

generate tool-use instruction data

fine-tune ToolLLaMA models

evaluate api calling performance

support thousands of real-world apis

provide rapidapi backend service

What you can build with ToolBench

Fine-tuning for API calling

Train or adapt models on the released dataset to improve accuracy on both simple and chained tool invocations.

Benchmarking tool-use agents

Apply the included ToolEval scripts to measure how well different LLMs plan and execute API sequences.

Open-domain tool retrieval

Combine the provided retriever with ToolLLaMA to let models fetch and use APIs they have never seen during training.

Install ToolBench

Quick start

git clone git@github.com:OpenBMB/ToolBench.git
cd ToolBench

1Clone the ToolBench GitHub repository
2Download the latest data archive from the linked Google Drive folder
3Install dependencies listed in the project requirements
4Launch the RapidAPI backend service or use the hosted key after form approval
5Run the supplied fine-tuning or evaluation scripts with your chosen base model

Works with

OpenAI APIPythonLLaMA-2RapidAPI

ToolBench: pros & cons

Pros

+Massive, automatically generated dataset with intact reasoning traces
+Open-source model checkpoints that reduce API hallucination
+Built-in support for both single-tool and multi-tool scenarios
+Public evaluation framework covering multiple model families

Cons

–Requires access to RapidAPI or a local simulation server
–Training still demands substantial GPU resources
–Periodic server IP updates needed for the hosted backend

Did you find this helpful?

Frequently asked questions

A LLaMA-based model fine-tuned on ToolBench data to improve tool-use performance and reduce hallucinations.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Promote ToolBench

Add this badge to your website, or share the tool.

DFeatured on DhanasviToolBench 0

ToolBench

What is ToolBench?

Capabilities