Skip to content
ai2_arc logo

ai2_arc

Verified

7,787 grade-school science questions for advanced QA research.

DatasetText & NLP461K/moFree
Open dataset
Updated 2026-06-15

What is ai2_arc?

AI2 ARC consists of 7,787 genuine grade-school multiple-choice science questions assembled for question-answering research. The dataset splits questions into Easy and Challenge sets and supplies a supporting corpus of more than 14 million science sentences.

It is useful for training and evaluating models on science reasoning tasks that require more than retrieval or simple co-occurrence methods.

What you can build with ai2_arc

Train science QA models

Develop and fine-tune multiple-choice question answering systems focused on elementary science topics using the Easy and Challenge splits.

Benchmark language model reasoning

Evaluate how well LLMs handle questions that defeat simple retrieval and word-overlap baselines by testing on the Challenge Set.

Build retrieval-augmented systems

Combine the 14 million science sentence corpus with the questions to prototype retrieval-based or knowledge-enhanced QA pipelines.

Load ai2_arc

Python
from datasets import load_dataset

ds = load_dataset("allenai/ai2_arc")
  1. 1pip install datasets
  2. 2from datasets import load_dataset
  3. 3dataset = load_dataset('ai2_arc', 'challenge')
  4. 4Access train/validation/test splits and the 'support' corpus field
  5. 5Use the 'choices' and 'answerKey' fields for model training or evaluation

ai2_arc: pros & cons

Pros

  • +Includes a dedicated Challenge Set that filters out easy retrieval-based questions
  • +Paired with a large 14M-sentence science corpus for additional context
  • +Standardized splits and multiple-choice format simplify benchmarking
  • +Covers a broad range of grade-school science concepts

Cons

  • Limited to multiple-choice format only
  • Questions are restricted to elementary level
  • Corpus sentences are provided without explicit alignment to individual questions
Did you find this helpful?

Frequently asked questions

A collection of 7,787 grade-school science multiple-choice questions split into Easy and Challenge sets, accompanied by over 14 million science sentences.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Promote ai2_arc

Add this badge to your website, or share the tool.

DFeatured on Dhanasviai2_arc 0