SWE-bench_Multilingual
VerifiedMultilingual benchmark for AI models resolving GitHub issues in code repositories.
What is SWE-bench_Multilingual?
SWE-bench_Multilingual provides a collection of GitHub issues and associated code repositories for evaluating models on multilingual software engineering tasks.
It supports benchmark evaluations in NLP and code generation for researchers focused on multilingual capabilities in software issue resolution.
What you can build with SWE-bench_Multilingual
Benchmarking multilingual code agents
Measure how well LLMs resolve GitHub issues across non-English repositories and programming languages.
Training cross-lingual repair models
Fine-tune models on issue-to-patch pairs from multiple natural languages to improve generalization.
Comparing language-specific performance
Run controlled experiments to quantify accuracy gaps between English and other language codebases.
Load SWE-bench_Multilingual
from datasets import load_dataset
ds = load_dataset("SWE-bench/SWE-bench_Multilingual")- 1pip install datasets
- 2from datasets import load_dataset
- 3ds = load_dataset('SWE-bench/SWE-bench_Multilingual')
- 4print(ds['test'][0])
- 5Use the 'instance_id', 'problem_statement' and 'patch' fields for evaluation
SWE-bench_Multilingual: pros & cons
Pros
- +Extends SWE-bench to non-English languages
- +Real GitHub issues and patches
- +Directly loadable via Hugging Face
- +Supports standardized model comparisons
Cons
- –Evaluation requires full repository setup
- –Limited documentation on language coverage
- –High compute cost for full runs
Frequently asked questions
A multilingual version of the SWE-bench dataset containing real software engineering tasks from GitHub issues in multiple languages.
User reviews
Verified reviews from the community shape this listing's rating.
Loading reviews…