Skip to content
arxiv-papers-by-subject logo

arxiv-papers-by-subject

Verified

Reorganized arXiv metadata partitioned by subject, year, and month.

DatasetText & NLP424K/moFree
Open dataset
Updated 2026-06-15

What is arxiv-papers-by-subject?

It is a reorganized version of the nick007x/arxiv-papers dataset with entries partitioned into directories by subject, year, and month.

It is useful for researchers needing targeted access to arXiv paper metadata without retrieving the full collection.

What you can build with arxiv-papers-by-subject

Subject-specific text generation

Train language models on subsets of arXiv abstracts filtered by subject code and date range to generate domain-specific scientific text.

Feature extraction pipelines

Load monthly batches of paper metadata to compute embeddings or extract keywords for downstream academic search or recommendation systems.

Temporal topic analysis

Analyze trends by loading year/month slices within a subject to track research evolution and build visualization dashboards.

Load arxiv-papers-by-subject

Python
from datasets import load_dataset

ds = load_dataset("permutans/arxiv-papers-by-subject")
  1. 1pip install datasets
  2. 2from datasets import load_dataset
  3. 3ds = load_dataset('permutans/arxiv-papers-by-subject', 'cs.AI', split='2023-01')
  4. 4Filter or iterate over the returned metadata columns
  5. 5Use abstracts or titles for model training or feature extraction

arxiv-papers-by-subject: pros & cons

Pros

  • +2.5M+ papers with hierarchical slicing
  • +Selective subset downloads reduce bandwidth
  • +Direct HF datasets integration
  • +Clean metadata suited for NLP tasks

Cons

  • Contains only metadata, no full-text PDFs
  • Subject codes follow arXiv taxonomy only
  • Requires internet for initial load
Did you find this helpful?

Frequently asked questions

A Hugging Face dataset of over 2.5 million arXiv paper metadata records organized by subject, year, and month.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Promote arxiv-papers-by-subject

Add this badge to your website, or share the tool.

DFeatured on Dhanasviarxiv-papers-by-subject 0