Yes, it is publicly available on the Hugging Face Hub at no cost.

How do I access the dataset?

Load it directly with the Hugging Face datasets library using load_dataset('CohereLabs/xP3x').

Check the dataset page on Hugging Face for the specific license terms provided by CohereLabs.

xP3x

Multilingual prompt collection across 277 languages and 16 NLP tasks.

DatasetText & NLP↓ 222K/moFree

Open dataset

Updated 2026-06-18

What is xP3x?

xP3x provides prompts and task data spanning 277 languages and 16 NLP tasks. It incorporates the full prior xP3 set along with further examples, resulting in a total size between 100 million and 1 billion instances.

The dataset supports training of multilingual models and has been used for developing successors to mT0 and BLOOMZ within the Aya project at Cohere Labs.

What you can build with xP3x

Train multilingual instruction-following models

Fine-tune models like mT5 or BLOOM on the prompt-task pairs to improve zero-shot performance across 277 languages.

Evaluate cross-lingual transfer

Measure how well a model trained on high-resource languages generalizes to the 200+ lower-resource languages included in the collection.

Build prompt datasets for low-resource NLP tasks

Extract and adapt subsets covering the 16 tasks to create targeted training data for specific languages or domains.

Load xP3x

Python

from datasets import load_dataset

ds = load_dataset("CohereLabs/xP3x")

1pip install datasets
2from datasets import load_dataset
3dataset = load_dataset('CohereLabs/xP3x')
4Select language or task subsets via config names
5Preprocess prompts and labels for your training loop

xP3x: pros & cons

Pros

+Covers 277 languages and 16 tasks
+Includes full xP3 plus additional data
+Ready-to-use prompt formats
+Directly usable via Hugging Face datasets

Cons

–Very large total size may require substantial storage
–Quality and quantity vary across languages
–Creation script needed for full reproducibility

Did you find this helpful?

Frequently asked questions

A large collection of prompts and datasets spanning 277 languages and 16 NLP tasks, extending the original xP3 for multilingual model training.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Promote xP3x

Add this badge to your website, or share the tool.

DFeatured on DhanasvixP3x 0

xP3x

Multilingual prompt collection across 277 languages and 16 NLP tasks.

DatasetText & NLP↓ 222K/moFree

Open dataset

Updated 2026-06-18

What is xP3x?

The dataset supports training of multilingual models and has been used for developing successors to mT0 and BLOOMZ within the Aya project at Cohere Labs.

What you can build with xP3x

Train multilingual instruction-following models

Fine-tune models like mT5 or BLOOM on the prompt-task pairs to improve zero-shot performance across 277 languages.

Evaluate cross-lingual transfer

Measure how well a model trained on high-resource languages generalizes to the 200+ lower-resource languages included in the collection.

Build prompt datasets for low-resource NLP tasks

Extract and adapt subsets covering the 16 tasks to create targeted training data for specific languages or domains.

Load xP3x

Python

from datasets import load_dataset

ds = load_dataset("CohereLabs/xP3x")

1pip install datasets
2from datasets import load_dataset
3dataset = load_dataset('CohereLabs/xP3x')
4Select language or task subsets via config names
5Preprocess prompts and labels for your training loop

xP3x: pros & cons

Pros

+Covers 277 languages and 16 tasks
+Includes full xP3 plus additional data
+Ready-to-use prompt formats
+Directly usable via Hugging Face datasets

Cons

–Very large total size may require substantial storage
–Quality and quantity vary across languages
–Creation script needed for full reproducibility

Did you find this helpful?

Frequently asked questions

A large collection of prompts and datasets spanning 277 languages and 16 NLP tasks, extending the original xP3 for multilingual model training.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Promote xP3x

Add this badge to your website, or share the tool.

DFeatured on DhanasvixP3x 0

xP3x

What is xP3x?

What you can build with xP3x

Train multilingual instruction-following models

Evaluate cross-lingual transfer

Build prompt datasets for low-resource NLP tasks

Load xP3x

xP3x: pros & cons

Pros

Cons

Frequently asked questions

User reviews

KakologArchives

wikitext

gsm8k

Promote xP3x

xP3x

What is xP3x?

What you can build with xP3x

Train multilingual instruction-following models

Evaluate cross-lingual transfer

Build prompt datasets for low-resource NLP tasks

Load xP3x

xP3x: pros & cons

Pros

Cons

Frequently asked questions

User reviews

KakologArchives

wikitext

gsm8k

Promote xP3x

xP3x

What is xP3x?

What you can build with xP3x

Train multilingual instruction-following models

Evaluate cross-lingual transfer

Build prompt datasets for low-resource NLP tasks

Load xP3x

xP3x: pros & cons

Pros

Cons

Frequently asked questions

What is xP3x?

Is xP3x free to use?

How do I access the dataset?

What is the license?

User reviews

Similar datasets

KakologArchives

wikitext

gsm8k

Promote xP3x

xP3x

What is xP3x?

What you can build with xP3x

Train multilingual instruction-following models

Evaluate cross-lingual transfer

Build prompt datasets for low-resource NLP tasks

Load xP3x

xP3x: pros & cons

Pros

Cons

Frequently asked questions

What is xP3x?

Is xP3x free to use?

How do I access the dataset?

What is the license?

User reviews

Similar datasets

KakologArchives

wikitext

gsm8k

Promote xP3x