Advanced Topics
Diffusion Models — How DALL-E and Stable Diffusion Create Images
Understand the revolutionary diffusion models behind modern image generation. Learn how these models gradually denoise random noise into coherent images.
- Forward Diffusion — The process of adding noise to data
- Reverse Diffusion — Learning to denoise and generate new samples
- Conditional Generation — Guiding generation with text or other inputs
"Creation is just destruction in reverse."
Diffusion Models — Complete Guide
Diffusion models are the state-of-the-art for image generation (DALL-E, Stable Diffusion, Midjourney).
How Diffusion Works
DfDiffusion Model
A diffusion model learns to generate data by reversing a gradual noising process. It consists of a fixed forward process that adds noise and a learned reverse process that removes noise to generate samples.
Forward Process (fixed): Image -> Add noise -> Add noise -> ... -> Pure noise
Reverse Process (learned): Pure noise -> Remove noise -> ... -> Image
The model LEARNS to reverse the noise!
Forward and Reverse Diffusion Process
How diffusion models generate images: This diagram shows the two core processes of diffusion models. The forward process (top, flowing right) gradually adds Gaussian noise to a clean image x₀ over T timesteps, progressively destroying information until it becomes pure noise x_T ~ N(0,I). This process is fixed and non-learned — the noise schedule β_t controls how quickly noise is added. The reverse process (bottom, flowing left) is learned by a neural network: starting from random noise x_T, it iteratively denoises one step at a time, progressively recovering the clean image. At each step, the model predicts the noise ε_θ(x_t, t) that was added, and subtracts it to get a slightly cleaner version. After T denoising steps, the result is a generated image x₀. The key insight: by learning to reverse a simple, known corruption process, we get a powerful generative model that can create realistic images from pure noise.
DDPM (Denoising Diffusion Probabilistic Models)
DfDDPM
DDPM is a foundational diffusion model that learns to denoise data by predicting the noise added at each timestep during the forward process.
Training:
- Take clean image
- Sample random timestep
- Add noise:
- Model predicts noise:
- Loss:
Generation:
- Start with random noise
- For :
- Return
DDPM Noise Schedule
Classifier-Free Guidance
DfClassifier-Free Guidance
A technique that improves sample quality by combining conditional and unconditional predictions, controlled by a guidance scale parameter.
Where = guidance scale:
- : No guidance (standard)
- : Good balance
- : Very high quality, low diversity
Classifier-Free Guidance Visualization
Latent Diffusion (Stable Diffusion)
DfLatent Diffusion Model
Operates in compressed latent space instead of pixel space, dramatically reducing compute cost while maintaining quality.
Latent Diffusion Architecture
Key Takeaways
Summary: Diffusion Models
- Diffusion models learn to reverse a noise process
- DDPM is the foundational algorithm
- Classifier-free guidance improves quality
- Latent diffusion (Stable Diffusion) operates in compressed space
- Score matching is an alternative formulation
- Diffusion models outperform GANs for image generation
- Text-to-image uses cross-attention for conditioning
- Diffusion models are the basis of Sora (video generation)
What to Learn Next
-> GANs — Generative Adversarial Networks Complete Guide Learn about gans — generative adversarial networks complete guide.
-> Autoencoders — Encoding, Decoding and Representation Learning Learn about autoencoders — encoding, decoding and representation learning.
-> 19-variational-autoencoders Learn about 19-variational-autoencoders.
-> Neural Networks Fundamentals — Perceptrons to Deep Learning Learn about neural networks fundamentals — perceptrons to deep learning.
-> Transformers — Attention Is All You Need Complete Guide Learn about transformers — attention is all you need complete guide.
-> Self-Supervised Learning — Pre-training Revolution Learn about self-supervised learning — pre-training revolution.