Skip to content
gsm8k logo

gsm8k

Verified

8.5K grade school math word problems requiring multi-step arithmetic reasoning.

DatasetText & NLP901K/moFree
Open dataset
Updated 2026-06-15

What is gsm8k?

GSM8K provides a collection of 8.5K high quality math word problems that involve multi-step reasoning with elementary calculations.

Researchers and developers use it to train and benchmark models for text generation and mathematical problem solving in natural language processing.

What you can build with gsm8k

Train multi-step math reasoners

Fine-tune language models on the 7.5k training examples to improve performance on arithmetic word problems that require 2-8 sequential operations.

Benchmark LLM reasoning

Evaluate models on the 1k test set to measure accuracy on grade-school math problems that test chaining of basic arithmetic.

Build tutoring prototypes

Use the problems and solutions to prototype educational apps that generate step-by-step explanations for elementary math questions.

Load gsm8k

Python
from datasets import load_dataset

ds = load_dataset("openai/gsm8k")
  1. 1pip install datasets
  2. 2from datasets import load_dataset
  3. 3ds = load_dataset('openai/gsm8k', 'main')
  4. 4Access ds['train'] and ds['test'] splits
  5. 5Parse 'question' and 'answer' fields for training or eval

gsm8k: pros & cons

Pros

  • +High-quality, human-written problems
  • +Clear multi-step reasoning focus
  • +Standard benchmark with public splits
  • +Easy HF datasets loading

Cons

  • Limited to grade-school difficulty
  • English language only
  • Requires custom parsing of solution strings
Did you find this helpful?

Frequently asked questions

A dataset of 8.5k grade-school math word problems designed for multi-step arithmetic reasoning evaluation and training.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Promote gsm8k

Add this badge to your website, or share the tool.

DFeatured on Dhanasvigsm8k 1