ArXiv paper metadata organized by publication year.
arxiv_metadata_by_year supplies arXiv paper metadata structured by publication year.
Researchers and developers use it for natural language processing and text analysis on scientific literature.
Analyze shifts in topics like machine learning or NLP by aggregating paper titles, abstracts, and categories across yearly splits.
Extract author lists and affiliations to build collaboration graphs filtered by publication year for network analysis.
Train and evaluate NLP models on paper metadata fields such as titles and primary categories using the year-based partitions.
from datasets import load_dataset
ds = load_dataset("bluuebunny/arxiv_metadata_by_year")A collection of arXiv paper metadata records grouped by year, containing between one and ten million entries in the NLP-text category.
Verified reviews from the community shape this listing's rating.
Loading reviews…