SynData
VerifiedLarge-scale multimodal dataset covering vision, language, and action data.
What is SynData?
SynData is a large-scale real-world multimodal dataset that covers vision, language, and action.
It supplies human data for embodied intelligence training and is intended for machine learning researchers working on multimodal models.
Data preview
A real sample from the dataset — 14 columns.
| subsetstring | clip_idstring | task_keystring | task_namestring | volume_idstring | rel_pathstring |
|---|---|---|---|---|---|
| ego | clip_pfmzgf2s2dwgrmxgex37 | task_0001 | Sort clothes | 000001 | tasks/task_0001/000001.zarr |
| ego | clip_xl4kfibjax442gnvmqax | task_0001 | Sort clothes | 000001 | tasks/task_0001/000001.zarr |
| ego | clip_elhvafiqotdzim5uybss | task_0001 | Sort clothes | 000001 | tasks/task_0001/000001.zarr |
| ego | clip_byhszwgx2oju3tvv3ss7 | task_0001 | Sort clothes | 000001 | tasks/task_0001/000001.zarr |
| ego | clip_6d4b6ka7arh3354nlho4 | task_0001 | Sort clothes | 000001 | tasks/task_0001/000001.zarr |
Dataset structure
| Subset | Split | Rows |
|---|---|---|
| all_clips | train | 449,363 |
What you can build with SynData
Train multimodal language models
Develop models that jointly process language with vision and action sequences from real human interactions for improved context understanding.
Build robotics simulation agents
Create agents that learn action prediction alongside language generation using the dataset's combined vision-language-action samples.
Evaluate cross-modal transfer
Test how well NLP models generalize when fine-tuned on multimodal human data spanning 100k-1M samples.
Load SynData
from datasets import load_dataset
ds = load_dataset("PsiBotAI/SynData")- 1pip install datasets
- 2from datasets import load_dataset
- 3dataset = load_dataset('psibotai/syndata')
- 4print(dataset.features) to inspect modalities
- 5Split into train/test and preprocess for your pipeline
SynData: pros & cons
Pros
- +Multimodal coverage across vision, language, and action
- +Real-world human data samples
- +Accessible through Hugging Face datasets library
- +Size range supports mid-scale experiments
Cons
- –Exact sample count not specified
- –No license or usage terms detailed
- –Category listed as nlp-text despite multimodal description
Frequently asked questions
A multimodal dataset from PsiBotAI containing real-world human data across vision, language, and action dimensions with 100,000 to 1 million samples.
User reviews
Verified reviews from the community shape this listing's rating.
Loading reviews…