Deep Learning
Generative Adversarial Networks — AI Creates Art, Faces, and More
Learn how GANs use adversarial training to generate realistic synthetic data from noise.
- Adversarial training — generator vs. discriminator dynamic
- Image synthesis — create photorealistic faces and art
- Training dynamics — Nash equilibrium, mode collapse, and convergence
Creativity is intelligence having fun.
GANs — Generative Adversarial Networks
GANs (Goodfellow et al., 2014) learn to generate data by framing generation as a two-player minimax game between a generator and discriminator .
GAN Architecture
How GAN training works: The diagram shows the two-player game at the heart of GANs. The Generator (green, left) takes random noise z (a 128-512 dimensional vector sampled from a normal distribution) and transforms it through deconvolution layers into a fake image. The Discriminator (yellow, right) receives both real images from the dataset and fake images from the generator, outputting a probability D(x) ∈ [0,1] — real images should score near 1, fake near 0. The training dynamics box at the bottom explains the adversarial loop: the generator learns from gradients flowing back through the discriminator ("make more realistic images"), while the discriminator learns to distinguish better ("catch the fakes"). At equilibrium, the generator produces perfect samples and the discriminator outputs 0.5 for everything — it can no longer tell real from fake. This minimax game drives both networks to improve simultaneously.
Loss Functions
DfOriginal GAN Loss (Minimax)
The GAN objective is a two-player minimax game:
Optimal discriminator (for fixed G):
Global optimum achieved when and .
DfNon-Saturating Loss
In practice, minimize instead of for the generator. This provides stronger gradients early in training:
Training Process
Mode Collapse
DCGAN Architecture
DfDCGAN Guidelines (Radford et al., 2015)
The Deep Convolutional GAN established stable architecture:
- Replace pooling with strided convolutions (D) and transposed convolutions (G)
- Use batch normalization in both G and D (except G output and D input)
- Remove fully connected layers
- G: ReLU activation (output layer: Tanh)
- D: LeakyReLU activation (α=0.2)
Stability tricks: Spectral normalization, progressive growing, two-timescale update rule.
Example: DCGAN Generator
class DCGANGenerator(nn.Module):
def __init__(self, latent_dim=100, channels=3):
super().__init__()
self.gen = nn.Sequential(
# z: (B, 100, 1, 1) → (B, 512, 4, 4)
nn.ConvTranspose2d(latent_dim, 512, 4, 1, 0, bias=False),
nn.BatchNorm2d(512), nn.ReLU(True),
# → (B, 256, 8, 8)
nn.ConvTranspose2d(512, 256, 4, 2, 1, bias=False),
nn.BatchNorm2d(256), nn.ReLU(True),
# → (B, 128, 16, 16)
nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),
nn.BatchNorm2d(128), nn.ReLU(True),
# → (B, 3, 32, 32)
nn.ConvTranspose2d(128, channels, 4, 2, 1, bias=False),
nn.Tanh()
)
def forward(self, z):
return self.gen(z.view(-1, 100, 1, 1))
GAN Variants
WGAN (Wasserstein GAN)
DfWasserstein Distance
The Earth Mover's Distance (Wasserstein-1) replaces JS divergence:
WGAN loss (using Kantorovich-Rubinstein duality):
Discriminator (called critic) must be 1-Lipschitz → enforced via weight clipping or spectral normalization.
Advantages: Meaningful loss metric, no mode collapse, stable training.
Key Takeaways
Summary: GANs
- GANs consist of Generator vs Discriminator in adversarial training
- Objective is a minimax game — Nash equilibrium at D(x) = 0.5
- Mode collapse: G produces limited variety — use WGAN, minibatch discrimination
- DCGAN established stable convolutional architecture
- WGAN uses Wasserstein distance for stable training and meaningful loss
- StyleGAN produces photorealistic faces via style injection
- Training is unstable — requires careful hyperparameter tuning
- GANs being replaced by diffusion models for many tasks
- Still useful for style transfer, image editing, super-resolution
What to Learn Next
-> Autoencoders Learn about compressed representations.
-> Variational Autoencoders Generate data with probabilistic models.
-> Diffusion Models Deep Dive Master modern generative AI techniques.
-> Neural Networks Understand the foundation of deep learning.
-> CNNs Learn the convolutional architectures used in GANs.
-> Training Deep Networks Master training techniques for unstable models.