doc-build-dev
VerifiedDocs from Hugging Face PRs updating official documentation.
What is doc-build-dev?
This dataset contains documentation files extracted from pull requests that modify the official Hugging Face documentation site.
It serves developers and contributors working on documentation for Hugging Face libraries and models by providing historical update data.
What you can build with doc-build-dev
Track Documentation Evolution
Analyze patterns in how Hugging Face library documentation updates across multiple PRs over time.
Train Documentation Models
Fine-tune NLP models on real doc diffs to suggest improvements or detect outdated sections.
Build Change Monitoring Tools
Create scripts that summarize or alert on documentation modifications from open-source PR activity.
Load doc-build-dev
from datasets import load_dataset
ds = load_dataset("hf-doc-build/doc-build-dev")- 1pip install datasets
- 2from datasets import load_dataset
- 3ds = load_dataset('hf-doc-build/doc-build-dev')
- 4Explore splits and filter by PR or doc page
- 5Process diffs with pandas or Hugging Face tokenizers
doc-build-dev: pros & cons
Pros
- +Automatically updated via GitHub Actions
- +Contains real PR-based documentation changes
- +Spans multiple Hugging Face libraries
- +Structured for direct NLP or analysis use
Cons
- –Scope limited to Hugging Face docs only
- –Data quality tied to original PR content
- –May need extra parsing for complex diffs
Frequently asked questions
A dataset that aggregates documentation changes from pull requests targeting Hugging Face docs.
User reviews
Verified reviews from the community shape this listing's rating.
Loading reviews…