Is SWE-bench Verified free to use?

Yes, it is publicly available on the Hugging Face Hub at no cost.

How do I access the dataset?

Load it directly with the Hugging Face datasets library using the identifier princeton-nlp/SWE-bench_Verified.

What license applies to this dataset?

Check the dataset card on Hugging Face for the exact license and usage terms.

SWE-bench_Verified — Free Dataset Docs, Examples & Alternatives (2026)

What is SWE-bench_Verified?

SWE-bench Verified contains 500 human-validated test instances drawn from the SWE-bench dataset. It focuses on automatic resolution of GitHub issues in Python repositories through Issue-Pull Request pairs.

It is useful for researchers and developers evaluating AI systems on real-world software engineering tasks, particularly those involving code changes verified by unit tests.

What you can build with SWE-bench_Verified

Benchmark LLM-based code repair agents

Run models on the 500 validated issue-PR pairs to measure how often generated patches pass the post-PR unit tests.

Compare agent performance on real GitHub issues

Use the dataset to evaluate different systems on their ability to resolve bugs from popular Python repositories with objective test verification.

Develop and test issue-resolution pipelines

Feed issue descriptions into retrieval or generation pipelines and score outputs against the verified test suites included in each sample.

Load SWE-bench_Verified

Python

from datasets import load_dataset

ds = load_dataset("princeton-nlp/SWE-bench_Verified")

1pip install datasets
2from datasets import load_dataset
3ds = load_dataset('princeton-nlp/SWE-bench_Verified')
4Load the 'test' split to access the 500 samples
5Use fields such as 'problem_statement', 'patch', and 'test_patch' for evaluation

SWE-bench_Verified: pros & cons

Pros

+Human-validated subset reduces noise
+Objective unit-test scoring
+Real issues from popular Python repos
+Directly compatible with Hugging Face datasets

Cons

–Only 500 examples total
–Python-only repositories
–Requires full repo checkout and test execution for full evaluation

Did you find this helpful?

Frequently asked questions

A curated 500-sample subset of SWE-bench with human-validated Issue-PR pairs from Python repositories, scored via unit tests.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Similar datasets

Other ai & machine learning options worth comparing.

FineNews

AI & Machine Learning · ksolovev

Verified

News dataset for AI and machine learning workflows.

Dataset↓ 1.5MFree

hd_tmp

AI & Machine Learning · ayuo

Verified

Temporary AI/ML dataset for Hugging Face prototyping.

Dataset↓ 1.5MFree

results

AI & Machine Learning · mteb

Verified

MTEB benchmark results for text embedding model evaluations.

Dataset↓ 1.3MFree

SWE-bench_Verified

What is SWE-bench_Verified?

What you can build with SWE-bench_Verified

Benchmark LLM-based code repair agents

Compare agent performance on real GitHub issues

Develop and test issue-resolution pipelines

Load SWE-bench_Verified

SWE-bench_Verified: pros & cons

Pros

Cons

Frequently asked questions

What is SWE-bench Verified?

Is SWE-bench Verified free to use?

How do I access the dataset?

What license applies to this dataset?

User reviews

Similar datasets

FineNews

hd_tmp

results

Promote SWE-bench_Verified