Is this dataset free to use?

Yes, it is publicly available through the Hugging Face Hub at no cost.

How do I access the dataset?

Load it directly with the datasets library using load_dataset('banned-historical-archives/zhongyangribao').

License information is not specified in the repository; users should check the source for redistribution terms.

zhongyangribao — Free Dataset Docs, Examples & Alternatives (2026)

What is zhongyangribao?

The zhongyangribao dataset consists of text from the historical Chinese newspaper Zhongyang Ribao archived by banned-historical-archives.

It is useful for NLP research and machine learning work involving Chinese historical newspaper and archival text data.

What you can build with zhongyangribao

Historical Chinese NLP training

Fine-tune language models on authentic mid-20th-century newspaper text for improved handling of classical-modern Chinese transitions and period-specific vocabulary.

Topic modeling of political discourse

Run LDA or BERTopic pipelines to track evolving themes such as propaganda, international relations, and domestic policy across decades of articles.

OCR and layout analysis benchmarking

Use the raw scans and transcripts to evaluate document-understanding models on noisy historical print layouts and traditional Chinese characters.

Load zhongyangribao

Python

from datasets import load_dataset

ds = load_dataset("banned-historical-archives/zhongyangribao")

1pip install datasets
2from datasets import load_dataset
3ds = load_dataset('banned-historical-archives/zhongyangribao')
4print(ds['train'][0])
5ds['train'].to_pandas().head()

zhongyangribao: pros & cons

Pros

+Large collection of real historical Chinese newspaper text
+Directly loadable via Hugging Face datasets
+Useful for period-specific language and political analysis
+Maintained under banned-historical-archives org

Cons

–No dataset card or description provided
–Likely Chinese-only content limits non-Chinese use
–Potential copyright or sensitivity restrictions on redistribution

Did you find this helpful?

Frequently asked questions

A Hugging Face dataset containing issues of the historical Chinese newspaper Zhongyang Ribao, hosted by banned-historical-archives.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Sign in to review

Similar datasets

Other ai & machine learning options worth comparing.

FineNews

AI & Machine Learning · ksolovev

Verified

News dataset for AI and machine learning workflows.

Dataset↓ 1.5MFree

hd_tmp

AI & Machine Learning · ayuo

Verified

Temporary AI/ML dataset for Hugging Face prototyping.

Dataset↓ 1.5MFree

results

AI & Machine Learning · mteb

Verified

MTEB benchmark results for text embedding model evaluations.

Dataset↓ 1.3MFree

zhongyangribao

What is zhongyangribao?

What you can build with zhongyangribao

Historical Chinese NLP training

Topic modeling of political discourse

OCR and layout analysis benchmarking

Load zhongyangribao

zhongyangribao: pros & cons

Pros

Cons

Frequently asked questions

What is the zhongyangribao dataset?

Is this dataset free to use?

How do I access the dataset?

What is the license?

User reviews

Similar datasets

FineNews

hd_tmp

results

Promote zhongyangribao