Is this dataset free to use?

Yes, it is hosted on the Hugging Face Hub and available under the licenses of its source datasets.

How do I access the dataset?

Load it directly with the Hugging Face datasets library using the identifier mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M.

What license applies to this collection?

The dataset inherits the licenses of its constituent sources (ImageNet-21k, LAION, COYO, SA-1B, etc.); users must comply with each.

LLaVA-OneVision-1.5-Mid-Training-85M

85M samples from eight vision datasets for LLaVA-OneVision-1.5 mid-training.

DatasetImages & Vision↓ 328K/moFree

Open dataset

Updated 2026-06-18

What is LLaVA-OneVision-1.5-Mid-Training-85M?

LLaVA-OneVision-1.5-Mid-Training-85M consists of image and text data compiled from the eight listed public sources for use in multimodal model training.

It is intended for researchers developing or reproducing open multimodal large language models that require large-scale mid-training stages.

What you can build with LLaVA-OneVision-1.5-Mid-Training-85M

Pre-train vision-language models

Use the 85M aggregated samples to continue pre-training models like LLaVA-OneVision-1.5 on diverse image-text pairs from multiple public sources.

Benchmark data scaling experiments

Measure the impact of mid-training on model performance by subsampling this 10-100M dataset and comparing against smaller curated sets.

Build multimodal retrieval systems

Leverage the combined ImageNet-21k, SA-1B, and web-scale sources to train or evaluate image-text retrieval components.

Load LLaVA-OneVision-1.5-Mid-Training-85M

Python

from datasets import load_dataset

ds = load_dataset("mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M")

1pip install datasets
2from datasets import load_dataset
3ds = load_dataset('mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M')
4Access the 'train' split and iterate over image-text pairs
5Filter or subsample as needed for your training pipeline

LLaVA-OneVision-1.5-Mid-Training-85M: pros & cons

Pros

+Large scale (85M samples) from eight established sources
+Released specifically for LLaVA-OneVision-1.5 mid-training
+Directly loadable via Hugging Face datasets library
+Covers broad visual domains including segmentation and captioning data

Cons

–Size requires substantial storage and compute
–Quality and license consistency varies across source datasets
–No built-in train/validation split provided

Did you find this helpful?

Frequently asked questions

An 85-million-sample vision dataset aggregated from ImageNet-21k, LAIONCN, DataComp-1B and other public collections to support mid-training of the LLaVA-OneVision-1.5 framework.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Promote LLaVA-OneVision-1.5-Mid-Training-85M

Add this badge to your website, or share the tool.

DFeatured on DhanasviLLaVA-OneVision-1.5-Mid-Training-85M 0

LLaVA-OneVision-1.5-Mid-Training-85M

85M samples from eight vision datasets for LLaVA-OneVision-1.5 mid-training.

DatasetImages & Vision↓ 328K/moFree

Open dataset

Updated 2026-06-18

What is LLaVA-OneVision-1.5-Mid-Training-85M?

LLaVA-OneVision-1.5-Mid-Training-85M consists of image and text data compiled from the eight listed public sources for use in multimodal model training.

It is intended for researchers developing or reproducing open multimodal large language models that require large-scale mid-training stages.

What you can build with LLaVA-OneVision-1.5-Mid-Training-85M

Pre-train vision-language models

Use the 85M aggregated samples to continue pre-training models like LLaVA-OneVision-1.5 on diverse image-text pairs from multiple public sources.

Benchmark data scaling experiments

Measure the impact of mid-training on model performance by subsampling this 10-100M dataset and comparing against smaller curated sets.

Build multimodal retrieval systems

Leverage the combined ImageNet-21k, SA-1B, and web-scale sources to train or evaluate image-text retrieval components.

Load LLaVA-OneVision-1.5-Mid-Training-85M

Python

from datasets import load_dataset

ds = load_dataset("mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M")

1pip install datasets
2from datasets import load_dataset
3ds = load_dataset('mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M')
4Access the 'train' split and iterate over image-text pairs
5Filter or subsample as needed for your training pipeline

LLaVA-OneVision-1.5-Mid-Training-85M: pros & cons

Pros

+Large scale (85M samples) from eight established sources
+Released specifically for LLaVA-OneVision-1.5 mid-training
+Directly loadable via Hugging Face datasets library
+Covers broad visual domains including segmentation and captioning data

Cons

–Size requires substantial storage and compute
–Quality and license consistency varies across source datasets
–No built-in train/validation split provided

Did you find this helpful?

Frequently asked questions

An 85-million-sample vision dataset aggregated from ImageNet-21k, LAIONCN, DataComp-1B and other public collections to support mid-training of the LLaVA-OneVision-1.5 framework.

User reviews

Verified reviews from the community shape this listing's rating.

Loading reviews…

Promote LLaVA-OneVision-1.5-Mid-Training-85M

Add this badge to your website, or share the tool.

DFeatured on DhanasviLLaVA-OneVision-1.5-Mid-Training-85M 0

LLaVA-OneVision-1.5-Mid-Training-85M

What is LLaVA-OneVision-1.5-Mid-Training-85M?

What you can build with LLaVA-OneVision-1.5-Mid-Training-85M

Pre-train vision-language models

Benchmark data scaling experiments

Build multimodal retrieval systems

Load LLaVA-OneVision-1.5-Mid-Training-85M

LLaVA-OneVision-1.5-Mid-Training-85M: pros & cons

Pros

Cons

Frequently asked questions

User reviews

documentation-images

banned-historical-archives

upload2

Promote LLaVA-OneVision-1.5-Mid-Training-85M

LLaVA-OneVision-1.5-Mid-Training-85M

What is LLaVA-OneVision-1.5-Mid-Training-85M?

What you can build with LLaVA-OneVision-1.5-Mid-Training-85M

Pre-train vision-language models

Benchmark data scaling experiments

Build multimodal retrieval systems

Load LLaVA-OneVision-1.5-Mid-Training-85M

LLaVA-OneVision-1.5-Mid-Training-85M: pros & cons

Pros

Cons

Frequently asked questions

User reviews

documentation-images

banned-historical-archives

upload2

Promote LLaVA-OneVision-1.5-Mid-Training-85M

LLaVA-OneVision-1.5-Mid-Training-85M

What is LLaVA-OneVision-1.5-Mid-Training-85M?

What you can build with LLaVA-OneVision-1.5-Mid-Training-85M

Pre-train vision-language models

Benchmark data scaling experiments

Build multimodal retrieval systems

Load LLaVA-OneVision-1.5-Mid-Training-85M

LLaVA-OneVision-1.5-Mid-Training-85M: pros & cons

Pros

Cons

Frequently asked questions

What is LLaVA-OneVision-1.5-Mid-Training-85M?

Is this dataset free to use?

How do I access the dataset?

What license applies to this collection?

User reviews

Similar datasets

documentation-images

banned-historical-archives

upload2

Promote LLaVA-OneVision-1.5-Mid-Training-85M

LLaVA-OneVision-1.5-Mid-Training-85M

What is LLaVA-OneVision-1.5-Mid-Training-85M?

What you can build with LLaVA-OneVision-1.5-Mid-Training-85M

Pre-train vision-language models

Benchmark data scaling experiments

Build multimodal retrieval systems

Load LLaVA-OneVision-1.5-Mid-Training-85M

LLaVA-OneVision-1.5-Mid-Training-85M: pros & cons

Pros

Cons

Frequently asked questions

What is LLaVA-OneVision-1.5-Mid-Training-85M?

Is this dataset free to use?

How do I access the dataset?

What license applies to this collection?

User reviews

Similar datasets

documentation-images

banned-historical-archives

upload2

Promote LLaVA-OneVision-1.5-Mid-Training-85M