Generative Models
VAEs — Learning Probabilistic Latent Representations
VAEs combine autoencoders with variational inference to learn smooth, continuous latent spaces. By encoding to distributions instead of points, they enable principled generation, meaningful interpolation, and disentangled representations — all with a stable training objective.
- Key point 1 — ELBO loss balances reconstruction quality with latent space regularization
- Key point 2 — Reparameterization trick enables backpropagation through stochastic sampling
- Key point 3 — Beta-VAE and VQ-VAE extend the framework for disentanglement and discrete latents
"In the latent space, every point tells a story."
Variational Autoencoders — Deep Dive
VAEs are generative models that learn a smooth latent space by combining autoencoders with variational inference. Unlike GANs, they provide a principled probabilistic framework for generation.
From Autoencoder to VAE
DfAutoencoder
A standard autoencoder compresses input into a latent code (encoder) and reconstructs it (decoder). It learns a deterministic mapping but the latent space may have gaps, making generation difficult.
DfVariational Autoencoder (VAE)
A VAE (Kingma and Welling, 2014) learns a probabilistic latent space:
- Encoder : Maps input to a distribution (mean , variance )
- Decoder : Reconstructs input from sampled latent
- Prior : Standard Gaussian
Instead of encoding to a point, VAE encodes to a distribution. Sampling from this distribution forces the latent space to be smooth and continuous.
How this diagram works: This diagram shows the VAE architecture, which differs from a standard autoencoder by encoding inputs to a probability distribution rather than a fixed point. The encoder (green) maps input to parameters of a Gaussian distribution — a mean and variance . Instead of sampling directly (which is non-differentiable), the reparameterization trick samples where , making the process differentiable for backpropagation. The decoder (red) then reconstructs the input from this sampled latent vector. The loss combines reconstruction quality with a KL divergence term that regularizes the latent space toward a standard Gaussian prior, ensuring it remains smooth and continuous — enabling meaningful interpolation and generation by sampling from the prior.
Evidence Lower Bound (ELBO)
Reconstruction Loss
Here,
- =Encoder distribution (approximate posterior)
- =Decoder distribution (likelihood)
- =Log-likelihood of reconstruction
KL Divergence
Here,
- =Mean of encoder output for dimension j
- =Variance of encoder output for dimension j
- =Latent dimension
ELBO Derivation
Since KL divergence is always non-negative, the first term (ELBO) is a lower bound on . Maximizing the ELBO simultaneously maximizes data likelihood and minimizes the gap to the true posterior.
Reparameterization Trick
ThReparameterization Trick
To backpropagate through a stochastic sampling operation, reparameterize the sampling as a deterministic transformation of noise:
This makes the sampling differentiable with respect to and while maintaining the stochastic nature through .
Reparameterization
Here,
- =Mean vector from encoder
- =Standard deviation from encoder
- =Random noise (sampled during forward, not backpropagated)
- =Element-wise multiplication