What is Stable Diffusion?
Stable Diffusion is a generative AI model that creates images from text prompts by reversing a gradual noising process.
It is based on diffusion models, which learn to remove noise from data step by step. Starting from random noise, the model iteratively denoises the image while being guided by a text embedding.
To make it efficient, Stable Diffusion operates in a compressed latent space rather than pixel space, using a U-Net architecture conditioned on text encodings from models like CLIP.
The approach allows high-quality image synthesis with relatively modest compute, and the open release of its weights enabled widespread community use and fine-tuning.
Example
A user types the prompt "a watercolor painting of a fox in a snowy forest at dusk" and receives a detailed, original image matching the description within seconds.
Why it matters
Stable Diffusion made high-quality text-to-image generation freely accessible and locally runnable, accelerating creative tools, research, and the broader adoption of generative AI.
Frequently asked questions
Yes, the core model weights are publicly available under an open license, though some web services charge for hosted access.
Related terms
A diffusion model is a generative AI technique that creates new data like images by learning to reverse a gradual noising process applied to training examples.
Diffusion is a generative modeling approach that creates new data samples by learning to reverse a gradual noising process. It starts from pure random noise and iteratively removes noise to produce realistic outputs like images or audio.
A Generative Adversarial Network (GAN) is a machine learning model made of two neural networks that compete against each other to generate realistic new data, such as images or text.
Generative AI (GenAI) is artificial intelligence that learns patterns from data to create new, original content such as text, images, audio, or code.
A multimodal model is a generative AI system that can process and create content across multiple data types, such as text, images, audio, or video, within a single model.
Text-to-Image is a generative AI technique that creates visual images from natural language text prompts.