🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Diffusion Models — State-of-the-Art Generative AI

Expert TopicsGenerative AI🟢 Free Lesson

Advertisement

Advanced Topics

Diffusion Models — How DALL-E and Stable Diffusion Create Images

Understand the revolutionary diffusion models behind modern image generation. Learn how these models gradually denoise random noise into coherent images.

  • Forward Diffusion — The process of adding noise to data
  • Reverse Diffusion — Learning to denoise and generate new samples
  • Conditional Generation — Guiding generation with text or other inputs

"Creation is just destruction in reverse."

Diffusion Models — Complete Guide

Diffusion models are the state-of-the-art for image generation (DALL-E, Stable Diffusion, Midjourney).


How Diffusion Works

DfDiffusion Model

A diffusion model learns to generate data by reversing a gradual noising process. It consists of a fixed forward process that adds noise and a learned reverse process that removes noise to generate samples.

Forward Process (fixed): Image -> Add noise -> Add noise -> ... -> Pure noise

x0x1x2xTx_0 \rightarrow x_1 \rightarrow x_2 \rightarrow \cdots \rightarrow x_T

Reverse Process (learned): Pure noise -> Remove noise -> ... -> Image

xTxT1x0x_T \rightarrow x_{T-1} \rightarrow \cdots \rightarrow x_0

The model LEARNS to reverse the noise!

Forward and Reverse Diffusion Process

Diffusion Model: Forward and Reverse ProcessForward Process (fixed): q(x_t | x_{t-1}) — Gradually add Gaussian noisexâ‚€CleanImagex₁Slightnoisex₂Morenoisex₃Evenmore···x_TPureNoiseReverse Process (learned): p_θ(x_{t-1} | x_t) — Gradually remove noisex_TPureNoisex_{T-1}Denoisestepx_{T-2}Denoisestep···x₁Denoisestepxâ‚€GeneratedImage!

How diffusion models generate images: This diagram shows the two core processes of diffusion models. The forward process (top, flowing right) gradually adds Gaussian noise to a clean image x₀ over T timesteps, progressively destroying information until it becomes pure noise x_T ~ N(0,I). This process is fixed and non-learned — the noise schedule β_t controls how quickly noise is added. The reverse process (bottom, flowing left) is learned by a neural network: starting from random noise x_T, it iteratively denoises one step at a time, progressively recovering the clean image. At each step, the model predicts the noise ε_θ(x_t, t) that was added, and subtracts it to get a slightly cleaner version. After T denoising steps, the result is a generated image x₀. The key insight: by learning to reverse a simple, known corruption process, we get a powerful generative model that can create realistic images from pure noise.


DDPM (Denoising Diffusion Probabilistic Models)

DfDDPM

DDPM is a foundational diffusion model that learns to denoise data by predicting the noise added at each timestep during the forward process.

Training:

  1. Take clean image x0x_0
  2. Sample random timestep tt
  3. Add noise: xt=αˉtx0+1αˉtϵx_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1-\bar{\alpha}_t} \epsilon
  4. Model predicts noise: ϵθ(xt,t)\epsilon_\theta(x_t, t)
  5. Loss: ϵϵθ(xt,t)2\|\epsilon - \epsilon_\theta(x_t, t)\|^2

Generation:

  1. Start with random noise xTN(0,I)x_T \sim \mathcal{N}(0, I)
  2. For t=T,T1,,1t = T, T-1, \ldots, 1: xt1=μθ(xt,t)+σtzx_{t-1} = \mu_\theta(x_t, t) + \sigma_t z
  3. Return x0x_0

DDPM Noise Schedule

DDPM Noise Schedule and Training ObjectiveForward Processq(x_t | x_0) = N(x_t; √ε_t·x_0, (1-ε_t)I)ε_t = ∏_{s=1}^{t} (1 - β_s)β_t: noise schedule (linear, cosine, etc.)T = 1000 timesteps typicallyReverse Processp_θ(x_{t-1} | x_t) = N(x_{t-1}; Μ_θ(x_t,t), σ_t²I)Μ_θ = 1/√α_t (x_t - β_t/√(1-ε_t)·ε_θ)ε_θ: U-Net predicts noise at each steptimestep embedding guides the networkTraining Loss (Simplified)L_simple = E_{t,x_0,ε} [ ||ε - ε_θ(x_t, t)||² ]Predict the noise that was added, not the denoised image directly

Classifier-Free Guidance

DfClassifier-Free Guidance

A technique that improves sample quality by combining conditional and unconditional predictions, controlled by a guidance scale parameter.

ϵguided=ϵuncond+w×(ϵcondϵuncond)\epsilon_{\text{guided}} = \epsilon_{\text{uncond}} + w \times (\epsilon_{\text{cond}} - \epsilon_{\text{uncond}})

Where ww = guidance scale:

  • w=1w = 1: No guidance (standard)
  • w=7.5w = 7.5: Good balance
  • w=15+w = 15+: Very high quality, low diversity

Classifier-Free Guidance Visualization

Classifier-Free Guidance: Quality vs Diversity Trade-offw = 1.0Standard samplingHigh diversityLower qualityw = 7.5BalancedGood qualityGood diversityw = 15+High guidanceVery high qualityLow diversityGuideScaleLowHighHigher w pushes generations toward the conditioning signal (text prompt)

Latent Diffusion (Stable Diffusion)

DfLatent Diffusion Model

Operates in compressed latent space instead of pixel space, dramatically reducing compute cost while maintaining quality.

Pixel Diffusion: O(H×W×C×T)\text{Pixel Diffusion: } \mathcal{O}(H \times W \times C \times T)
Latent Diffusion: O(h×w×c×T)where hH,wW\text{Latent Diffusion: } \mathcal{O}(h \times w \times c \times T) \quad \text{where } h \ll H, w \ll W

Latent Diffusion Architecture

Latent Diffusion Model (Stable Diffusion)Image512×512×3EncoderVAE EncLatent z64×64×448× smaller!U-Net DiffusionDenoise in latent spaceCross-attention with textDenoised latent zâ‚€DecoderVAE DecOutput512×512×3Text Encoder (CLIP)"a photo of a cat"

Key Takeaways

Summary: Diffusion Models

  • Diffusion models learn to reverse a noise process
  • DDPM is the foundational algorithm
  • Classifier-free guidance improves quality
  • Latent diffusion (Stable Diffusion) operates in compressed space
  • Score matching is an alternative formulation
  • Diffusion models outperform GANs for image generation
  • Text-to-image uses cross-attention for conditioning
  • Diffusion models are the basis of Sora (video generation)

What to Learn Next

-> GANs — Generative Adversarial Networks Complete Guide Learn about gans — generative adversarial networks complete guide.

-> Autoencoders — Encoding, Decoding and Representation Learning Learn about autoencoders — encoding, decoding and representation learning.

-> 19-variational-autoencoders Learn about 19-variational-autoencoders.

-> Neural Networks Fundamentals — Perceptrons to Deep Learning Learn about neural networks fundamentals — perceptrons to deep learning.

-> Transformers — Attention Is All You Need Complete Guide Learn about transformers — attention is all you need complete guide.

-> Self-Supervised Learning — Pre-training Revolution Learn about self-supervised learning — pre-training revolution.

Premium Content

Diffusion Models — State-of-the-Art Generative AI

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement