Is HumanEval free to use?

Yes, it is publicly released and accessible via Hugging Face.

How do I access the dataset?

Load it directly with the Hugging Face datasets library using load_dataset('openai_humaneval').

What license applies?

MIT license as provided by OpenAI.

openai_humaneval

164 handwritten Python problems for evaluating code generation models.

DatasetText & NLP↓ 260K/moFree

Open dataset

Updated 2026-06-18

What is openai_humaneval?

The dataset consists of 164 handwritten Python functions, each including a signature, docstring, implementation, and multiple unit tests. Problems were created manually to ensure they were absent from training sets of code generation models.

It is useful for researchers evaluating large language models on programming tasks, code completion, and functional correctness in Python.

What you can build with openai_humaneval

Benchmarking code generation models

Run pass@k evaluations on LLMs by prompting them with function signatures and docstrings then checking outputs against the included unit tests.

Comparing synthesis techniques

Test new prompting strategies or fine-tuned models on the 164 problems to measure improvements in functional correctness.

Validating code assistants

Integrate the dataset into CI pipelines to automatically score internal code-completion tools before deployment.

Load openai_humaneval

Python

from datasets import load_dataset

ds = load_dataset("openai/openai_humaneval")

1pip install datasets
2from datasets import load_dataset
3ds = load_dataset('openai_humaneval')
4problems = ds['test']
5Score model completions using the provided unit tests