Over 300k news articles for summarization and comprehension tasks.
The dataset consists of just over 300k unique news articles written by journalists at CNN and the Daily Mail. Current versions enable both extractive and abstractive summarization.
It is useful for training and evaluating models on summarization, machine reading comprehension, and question answering.
Develop and fine-tune models like BART or T5 to generate concise summaries from full news articles using the provided highlights as targets.
Build models that select key sentences from CNN and Daily Mail articles to create summaries without generating new text.
Train reading comprehension models on the original CNN/DailyMail articles to answer questions derived from the article content.
from datasets import load_dataset
ds = load_dataset("abisee/cnn_dailymail")A collection of over 300,000 English news articles from CNN and Daily Mail originally created for summarization and reading comprehension tasks.
Verified reviews from the community shape this listing's rating.
Loading reviews…