Skip to content
ACL-OCL logo

ACL-OCL

Verified

Full-text ACL Anthology papers with Grobid extractions, PDFs, and metadata.

DatasetText & NLP391K/moFree
Open dataset
Updated 2026-06-15

What is ACL-OCL?

ACL-OCL supplies the ACL Anthology corpus with PDF files, full text, references, and additional fields obtained through Grobid processing of the original PDFs.

It supports NLP research on scholarly documents, citation analysis, and information extraction from scientific publications.

What you can build with ACL-OCL

Train domain-specific language models

Use the full-text extractions to fine-tune BERT-style models on computational linguistics papers for tasks like scientific entity recognition.

Build citation and reference graphs

Leverage Grobid-extracted references and metadata to construct citation networks for analyzing research trends in NLP.

Develop PDF parsing benchmarks

Compare custom PDF-to-text pipelines against the provided Grobid outputs on the 80k ACL articles.

Load ACL-OCL

Python
from datasets import load_dataset

ds = load_dataset("WINGNUS/ACL-OCL")
  1. 1pip install datasets
  2. 2from datasets import load_dataset
  3. 3dataset = load_dataset('WINGNUS/ACL-OCL')
  4. 4Access 'train' split for full corpus with pdfs and grobid fields
  5. 5Filter by year or venue metadata for targeted subsets

ACL-OCL: pros & cons

Pros

  • +Includes full PDFs and Grobid extractions beyond abstracts
  • +Large scale: 80k ACL articles as of 2022
  • +Ready-to-use on Hugging Face datasets library
  • +Provides references and structured metadata

Cons

  • Grobid extractions can contain parsing errors
  • Dataset size requires significant storage for PDFs
  • Updates depend on external ACL Anthology releases
Did you find this helpful?

Frequently asked questions

A Hugging Face dataset providing full-text, PDFs, and Grobid extractions for the ACL Anthology collection of 80k papers.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Promote ACL-OCL

Add this badge to your website, or share the tool.

DFeatured on DhanasviACL-OCL 0