Why does it take many steps to generate an image?

Each step removes only a small amount of noise; many iterative steps are needed to gradually transform pure noise into a coherent sample while maintaining stability and detail.

What is a noise schedule?

A noise schedule defines how much noise is added or removed at each timestep, controlling the speed and quality of the diffusion and denoising processes.

What is Diffusion?

Diffusion is a generative modeling approach that creates new data samples by learning to reverse a gradual noising process. It starts from pure random noise and iteratively removes noise to produce realistic outputs like images or audio.

In the forward diffusion process, noise is systematically added to training data over many steps until the original data becomes indistinguishable from random Gaussian noise. This creates a Markov chain that gradually destroys structure in the data.

A neural network is trained to predict and remove noise at each step, effectively learning the reverse denoising process. At inference time, the model starts from pure noise and applies the learned denoising steps to generate new samples.

Key ideas include using a noise schedule to control how much noise is added or removed at each timestep and often operating in a compressed latent space for efficiency, as seen in latent diffusion models.

Example

Stable Diffusion uses a diffusion model to generate photorealistic images from text prompts. It begins with random noise and iteratively denoises it over dozens of steps guided by the text embedding to produce a coherent image.

Why it matters

Diffusion models currently power many state-of-the-art generative systems for images, video, and audio, often outperforming GANs in sample quality and training stability. They underpin widely used tools like Stable Diffusion and DALL-E 3.

Frequently asked questions

Diffusion models train by reversing a noise-adding process rather than using an adversarial game between generator and discriminator, often leading to more stable training and higher-quality diverse outputs.

Related terms

Generative Adversarial Network

A Generative Adversarial Network (GAN) is a machine learning model made of two neural networks that compete against each other to generate realistic new data, such as images or text.

Variational Autoencoder

A Variational Autoencoder (VAE) is a neural network that learns a compressed probabilistic representation of data and can generate new similar examples by sampling from that space. It combines autoencoders with variational inference to enable both reconstruction and generation.

Diffusion Model

A diffusion model is a generative AI technique that creates new data like images by learning to reverse a gradual noising process applied to training examples.

Generative AI

Generative AI (GenAI) is artificial intelligence that learns patterns from data to create new, original content such as text, images, audio, or code.

Multimodal Model

A multimodal model is a generative AI system that can process and create content across multiple data types, such as text, images, audio, or video, within a single model.

Stable Diffusion

Stable Diffusion is a generative AI model that creates images from text prompts by reversing a gradual noising process.