Skip to content
arxiv-cs-2020-2025-pdfs logo

arxiv-cs-2020-2025-pdfs

Verified

arXiv computer science PDFs from 2020 to 2025.

DatasetAI & Machine Learning439K/moFree
Open dataset
Updated 2026-06-15

What is arxiv-cs-2020-2025-pdfs?

The arxiv-cs-2020-2025-pdfs dataset consists of PDF files from arXiv computer science submissions dated 2020 through 2025.

It supports researchers and model developers working with academic literature in AI and machine learning.

What you can build with arxiv-cs-2020-2025-pdfs

Training PDF parsing models

Developers can fine-tune layout detection or OCR models on the raw PDF files to improve extraction of equations, tables, and figures from scientific documents.

Domain-specific LLM pretraining

Use the full-text content of recent CS papers to continue pretraining language models on technical vocabulary and research writing styles.

Building academic search tools

Index the papers to create semantic search or citation recommendation systems focused on 2020-2025 computer science literature.

Load arxiv-cs-2020-2025-pdfs

Python
from datasets import load_dataset

ds = load_dataset("Chelsea707/arxiv-cs-2020-2025-pdfs")
  1. 1Install the datasets library via pip install datasets
  2. 2Import load_dataset from the datasets package
  3. 3Load with load_dataset('Chelsea707/arxiv-cs-2020-2025-pdfs')
  4. 4Iterate over the dataset to access individual PDF files
  5. 5Extract text using pdfplumber or PyMuPDF for downstream tasks

arxiv-cs-2020-2025-pdfs: pros & cons

Pros

  • +Full PDFs of recent arXiv CS papers
  • +Straightforward Hugging Face loading
  • +Covers five years of computer science output
  • +Ready for large-scale document AI experiments

Cons

  • No size, splits, or metadata details provided
  • PDFs need extra processing before text use
  • License and redistribution terms unspecified
Did you find this helpful?

Frequently asked questions

A collection of PDF files containing computer science articles from arXiv published between 2020 and 2025.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Promote arxiv-cs-2020-2025-pdfs

Add this badge to your website, or share the tool.

DFeatured on Dhanasviarxiv-cs-2020-2025-pdfs 0