Real-world spreadsheet manipulation benchmark with 912 authentic questions for LLMs.
SpreadsheetBench is a benchmark of 912 real questions for spreadsheet manipulation, collected exclusively from practical user workflows and paired with corresponding spreadsheet files.
It supports evaluation of large language models on authentic spreadsheet tasks and is intended for researchers comparing model performance against existing synthesized benchmarks.
Test how well a new LLM or agent handles real user queries like formula creation, data filtering, and chart generation on authentic .xlsx files.
Use the 912 question-file pairs as supervised data to fine-tune models that output correct spreadsheet operations or Python code for pandas/openpyxl.
Run standardized evaluations against other LLMs to measure progress on practical spreadsheet manipulation without relying on synthetic test sets.
from datasets import load_dataset
ds = load_dataset("KAKA22/SpreadsheetBench")A benchmark with 912 real-world spreadsheet questions that tests LLMs on manipulation tasks using actual unaltered files.
Verified reviews from the community shape this listing's rating.
Loading reviews…