Skip to content
LLM-Agent-Benchmark-List logo

LLM-Agent-Benchmark-List

Verified

Curated collection of benchmarks for LLM agents and tool use.

Autonomous AgentsResearch 167Open source
View on GitHub
Updated 2026-06-16
LLM-Agent-Benchmark-List GitHub repository

What is LLM-Agent-Benchmark-List?

LLM-Agent-Benchmark-List is a community-maintained directory that catalogs benchmarks designed to evaluate large language models when used as agents. It groups resources into categories such as surveys, tool-use evaluations, reasoning tasks, knowledge integration, and graph-based assessments.

Users browse the compiled list to discover relevant papers, GitHub repositories, and datasets without searching across scattered sources. Each entry includes publication dates, authors, arXiv links, and project pages to support quick access and citation.

The resource primarily serves AI researchers, benchmark creators, and engineers who need standardized ways to measure agent capabilities before deploying models in real applications.

Capabilities

curate llm benchmarks
support model evaluation
list agent tests
provide evaluation resources

What you can build with LLM-Agent-Benchmark-List

Selecting evaluation suites

Researchers scan the categorized lists to identify suitable benchmarks for testing new agent frameworks on tool calling or multi-step reasoning.

Tracking recent progress

Developers review the latest survey papers and benchmark releases to stay current on evaluation methods in the fast-moving LLM field.

Contributing new entries

Contributors submit pull requests to add overlooked benchmarks, keeping the collection comprehensive for the broader community.

Install LLM-Agent-Benchmark-List

  1. 1Visit the GitHub repository page for LLM-Agent-Benchmark-List.
  2. 2Review the README sections organized by Survey, ToolUse, Reasoning, Knowledge, and Graph.
  3. 3Click any linked paper or project page to access the original benchmark materials.
  4. 4Fork the repo and open a pull request to suggest additions or corrections.
  5. 5Star the repository to receive notifications about future updates.

LLM-Agent-Benchmark-List: pros & cons

Pros

  • +Organizes scattered benchmarks into clear topical categories
  • +Includes direct links to papers and code for fast follow-up
  • +Actively maintained with community contributions welcomed
  • +Covers multiple evaluation dimensions relevant to agents

Cons

  • Provides only links rather than runnable benchmark code
  • Quality and coverage depend on external submissions
  • No built-in tooling for running or comparing results
Did you find this helpful?

Frequently asked questions

It is a curated reading list of benchmarks with links, not executable software.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Promote LLM-Agent-Benchmark-List

Add this badge to your website, or share the tool.

DFeatured on DhanasviLLM-Agent-Benchmark-List 0