How does it differ from DALL-E?

Stable Diffusion is open-source and can run on personal hardware, whereas DALL-E is a closed, cloud-only service from OpenAI.

What kind of prompt works best?

Detailed, descriptive prompts that specify style, lighting, and composition tend to produce more predictable and higher-quality results.

What is Stable Diffusion?

Stable Diffusion is a generative AI model that creates images from text prompts by reversing a gradual noising process.

It is based on diffusion models, which learn to remove noise from data step by step. Starting from random noise, the model iteratively denoises the image while being guided by a text embedding.

To make it efficient, Stable Diffusion operates in a compressed latent space rather than pixel space, using a U-Net architecture conditioned on text encodings from models like CLIP.

The approach allows high-quality image synthesis with relatively modest compute, and the open release of its weights enabled widespread community use and fine-tuning.

Example

A user types the prompt "a watercolor painting of a fox in a snowy forest at dusk" and receives a detailed, original image matching the description within seconds.

Why it matters

Stable Diffusion made high-quality text-to-image generation freely accessible and locally runnable, accelerating creative tools, research, and the broader adoption of generative AI.

Frequently asked questions

Yes, the core model weights are publicly available under an open license, though some web services charge for hosted access.

Related terms

Diffusion Model

A diffusion model is a generative AI technique that creates new data like images by learning to reverse a gradual noising process applied to training examples.

Diffusion

Diffusion is a generative modeling approach that creates new data samples by learning to reverse a gradual noising process. It starts from pure random noise and iteratively removes noise to produce realistic outputs like images or audio.

Generative Adversarial Network

A Generative Adversarial Network (GAN) is a machine learning model made of two neural networks that compete against each other to generate realistic new data, such as images or text.

Generative AI

Generative AI (GenAI) is artificial intelligence that learns patterns from data to create new, original content such as text, images, audio, or code.

Multimodal Model

A multimodal model is a generative AI system that can process and create content across multiple data types, such as text, images, audio, or video, within a single model.

Text-to-Image

Text-to-Image is a generative AI technique that creates visual images from natural language text prompts.