Multilingual prompt collection across 277 languages and 16 NLP tasks.
xP3x provides prompts and task data spanning 277 languages and 16 NLP tasks. It incorporates the full prior xP3 set along with further examples, resulting in a total size between 100 million and 1 billion instances.
The dataset supports training of multilingual models and has been used for developing successors to mT0 and BLOOMZ within the Aya project at Cohere Labs.
Fine-tune models like mT5 or BLOOM on the prompt-task pairs to improve zero-shot performance across 277 languages.
Measure how well a model trained on high-resource languages generalizes to the 200+ lower-resource languages included in the collection.
Extract and adapt subsets covering the 16 tasks to create targeted training data for specific languages or domains.
from datasets import load_dataset
ds = load_dataset("CohereLabs/xP3x")A large collection of prompts and datasets spanning 277 languages and 16 NLP tasks, extending the original xP3 for multilingual model training.
Verified reviews from the community shape this listing's rating.
Loading reviews…