Skip to content
banned-historical-archives logo

banned-historical-archives

Verified

Archive of banned Chinese historical documents, newspapers and images.

DatasetImages & Vision1.3M/moFree
Open dataset
Updated 2026-06-15

What is banned-historical-archives?

It organizes entered website content, raw original files, configuration data, and pending items not yet published on the source site.

Useful for researchers studying restricted Chinese historical publications and for vision tasks involving scanned newspapers and archival imagery.

What you can build with banned-historical-archives

Train OCR models on historical Chinese print

Use the raw image scans to fine-tune OCR pipelines for extracting text from mid-20th-century Chinese newspapers and periodicals.

Study visual patterns in censored materials

Analyze layout, redactions, and typography across the archived pages to identify characteristics of banned or restricted publications.

Build small-scale document classification prototypes

Create proof-of-concept classifiers that label scanned pages by source publication or topic using the under-1K image collection.

Load banned-historical-archives

Python
from datasets import load_dataset

ds = load_dataset("banned-historical-archives/banned-historical-archives")
  1. 1pip install datasets
  2. 2from datasets import load_dataset
  3. 3ds = load_dataset('banned-historical-archives/banned-historical-archives')
  4. 4Access image files via the 'image' or file columns in each split
  5. 5Inspect the 'todo' directory contents for unprocessed materials

banned-historical-archives: pros & cons

Pros

  • +Direct sync of raw files from original source
  • +Includes additional unprocessed items
  • +Compact size under 1K entries
  • +Vision-oriented image collection

Cons

  • Small total entry count limits scale
  • Portion of data remains unprocessed
  • Full scans require separate linked repositories
Did you find this helpful?

Frequently asked questions

A vision dataset of raw image files synced from banned-historical-archives.github.io plus unprocessed materials, focused on historical Chinese publications.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Promote banned-historical-archives

Add this badge to your website, or share the tool.

DFeatured on Dhanasvibanned-historical-archives 1