Why add noise in the first place?

Adding noise creates a smooth path from data to pure noise; learning the reverse path lets the model generate new data by starting from noise.

Are diffusion models only for images?

No, they are also applied to audio, video, 3D shapes, and even text or molecular structures.

What is Diffusion Model?

A diffusion model is a generative AI technique that creates new data like images by learning to reverse a gradual noising process applied to training examples.

In the forward process, random noise is slowly added to data over many steps until it becomes pure noise. The model is trained to predict and remove this noise at each step.

During generation, the model starts from random noise and iteratively denoises it, guided by learned patterns, to produce coherent new samples.

Key ideas include using a Markov chain for the diffusion steps and optimizing a simple noise-prediction objective that enables high-quality, stable training.

Example

Stable Diffusion uses a diffusion model to turn a text prompt like 'a cat astronaut' into a detailed image by starting from noise and gradually refining it into a recognizable picture.

Why it matters

Diffusion models currently power the highest-quality image and video generators used in creative tools, research, and applications like design and entertainment.

Frequently asked questions

Diffusion models train by reversing noise addition rather than using an adversarial game between generator and discriminator, often yielding more stable training and higher quality.

Related terms

Generative Adversarial Network

A Generative Adversarial Network (GAN) is a machine learning model made of two neural networks that compete against each other to generate realistic new data, such as images or text.

Variational Autoencoder

A Variational Autoencoder (VAE) is a neural network that learns a compressed probabilistic representation of data and can generate new similar examples by sampling from that space. It combines autoencoders with variational inference to enable both reconstruction and generation.

Multimodal Model

A multimodal model is a generative AI system that can process and create content across multiple data types, such as text, images, audio, or video, within a single model.