Advanced Topics

Diffusion Models — How DALL-E and Stable Diffusion Create Images

Understand the revolutionary diffusion models behind modern image generation. Learn how these models gradually denoise random noise into coherent images.

Forward Diffusion — The process of adding noise to data
Reverse Diffusion — Learning to denoise and generate new samples
Conditional Generation — Guiding generation with text or other inputs

"Creation is just destruction in reverse."

Diffusion Models — Complete Guide

Diffusion models are the state-of-the-art for image generation (DALL-E, Stable Diffusion, Midjourney).

How Diffusion Works

DfDiffusion Model

A diffusion model learns to generate data by reversing a gradual noising process. It consists of a fixed forward process that adds noise and a learned reverse process that removes noise to generate samples.

Forward Process (fixed): Image -> Add noise -> Add noise -> ... -> Pure noise

x_0 \rightarrow x_1 \rightarrow x_2 \rightarrow \cdots \rightarrow x_T

Reverse Process (learned): Pure noise -> Remove noise -> ... -> Image

x_T \rightarrow x_{T-1} \rightarrow \cdots \rightarrow x_0

The model LEARNS to reverse the noise!

Forward and Reverse Diffusion Process

How diffusion models generate images: This diagram shows the two core processes of diffusion models. The forward process (top, flowing right) gradually adds Gaussian noise to a clean image x₀ over T timesteps, progressively destroying information until it becomes pure noise x_T ~ N(0,I). This process is fixed and non-learned — the noise schedule β_t controls how quickly noise is added. The reverse process (bottom, flowing left) is learned by a neural network: starting from random noise x_T, it iteratively denoises one step at a time, progressively recovering the clean image. At each step, the model predicts the noise ε_θ(x_t, t) that was added, and subtracts it to get a slightly cleaner version. After T denoising steps, the result is a generated image x₀. The key insight: by learning to reverse a simple, known corruption process, we get a powerful generative model that can create realistic images from pure noise.

DDPM (Denoising Diffusion Probabilistic Models)

DfDDPM

DDPM is a foundational diffusion model that learns to denoise data by predicting the noise added at each timestep during the forward process.

Training:

Take clean image $x_0$
Sample random timestep $t$
Add noise: $x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1-\bar{\alpha}_t} \epsilon$
Model predicts noise: $\epsilon_\theta(x_t, t)$
Loss: $\|\epsilon - \epsilon_\theta(x_t, t)\|^2$

Generation:

Start with random noise $x_T \sim \mathcal{N}(0, I)$
For $t = T, T-1, \ldots, 1$ : $x_{t-1} = \mu_\theta(x_t, t) + \sigma_t z$
Return $x_0$

DDPM Noise Schedule

Classifier-Free Guidance

DfClassifier-Free Guidance

A technique that improves sample quality by combining conditional and unconditional predictions, controlled by a guidance scale parameter.

\epsilon_{\text{guided}} = \epsilon_{\text{uncond}} + w \times (\epsilon_{\text{cond}} - \epsilon_{\text{uncond}})

Where $w$ = guidance scale:

$w = 1$ : No guidance (standard)
$w = 7.5$ : Good balance
$w = 15+$ : Very high quality, low diversity

Classifier-Free Guidance Visualization

Latent Diffusion (Stable Diffusion)

DfLatent Diffusion Model

Operates in compressed latent space instead of pixel space, dramatically reducing compute cost while maintaining quality.

\text{Pixel Diffusion: } \mathcal{O}(H \times W \times C \times T)

\text{Latent Diffusion: } \mathcal{O}(h \times w \times c \times T) \quad \text{where } h \ll H, w \ll W

Latent Diffusion Architecture

Key Takeaways

Summary: Diffusion Models

Diffusion models learn to reverse a noise process
DDPM is the foundational algorithm
Classifier-free guidance improves quality
Latent diffusion (Stable Diffusion) operates in compressed space
Score matching is an alternative formulation
Diffusion models outperform GANs for image generation
Text-to-image uses cross-attention for conditioning
Diffusion models are the basis of Sora (video generation)

What to Learn Next

-> GANs — Generative Adversarial Networks Complete Guide Learn about gans — generative adversarial networks complete guide.

-> Autoencoders — Encoding, Decoding and Representation Learning Learn about autoencoders — encoding, decoding and representation learning.

-> 19-variational-autoencoders Learn about 19-variational-autoencoders.

-> Neural Networks Fundamentals — Perceptrons to Deep Learning Learn about neural networks fundamentals — perceptrons to deep learning.

-> Transformers — Attention Is All You Need Complete Guide Learn about transformers — attention is all you need complete guide.

-> Self-Supervised Learning — Pre-training Revolution Learn about self-supervised learning — pre-training revolution.

Diffusion Models — State-of-the-Art Generative AI