Deep Learning

Generative Adversarial Networks — AI Creates Art, Faces, and More

Learn how GANs use adversarial training to generate realistic synthetic data from noise.

Adversarial training — generator vs. discriminator dynamic
Image synthesis — create photorealistic faces and art
Training dynamics — Nash equilibrium, mode collapse, and convergence

Creativity is intelligence having fun.

GANs — Generative Adversarial Networks

GANs (Goodfellow et al., 2014) learn to generate data by framing generation as a two-player minimax game between a generator $G$ and discriminator $D$ .

GAN Architecture

How GAN training works: The diagram shows the two-player game at the heart of GANs. The Generator (green, left) takes random noise z (a 128-512 dimensional vector sampled from a normal distribution) and transforms it through deconvolution layers into a fake image. The Discriminator (yellow, right) receives both real images from the dataset and fake images from the generator, outputting a probability D(x) ∈ [0,1] — real images should score near 1, fake near 0. The training dynamics box at the bottom explains the adversarial loop: the generator learns from gradients flowing back through the discriminator ("make more realistic images"), while the discriminator learns to distinguish better ("catch the fakes"). At equilibrium, the generator produces perfect samples and the discriminator outputs 0.5 for everything — it can no longer tell real from fake. This minimax game drives both networks to improve simultaneously.

Loss Functions

DfOriginal GAN Loss (Minimax)

The GAN objective is a two-player minimax game:

\min_G \max_D \; V(D, G) = \mathbb{E}_{\mathbf{x} \sim p_{\text{data}}}[\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \sim p_z}[\log(1 - D(G(\mathbf{z})))]

Optimal discriminator (for fixed G):

D^*_G(\mathbf{x}) = \frac{p_{\text{data}}(\mathbf{x})}{p_{\text{data}}(\mathbf{x}) + p_g(\mathbf{x})}

Global optimum achieved when $p_g = p_{\text{data}}$ and $D^*(\mathbf{x}) = \frac{1}{2}$ .

DfNon-Saturating Loss

In practice, minimize $-\log D(G(\mathbf{z}))$ instead of $\log(1 - D(G(\mathbf{z})))$ for the generator. This provides stronger gradients early in training:

\mathcal{L}_G = -\mathbb{E}_{\mathbf{z} \sim p_z}[\log D(G(\mathbf{z}))]

\mathcal{L}_D = -\mathbb{E}_{\mathbf{x} \sim p_{\text{data}}}[\log D(\mathbf{x})] - \mathbb{E}_{\mathbf{z} \sim p_z}[\log(1 - D(G(\mathbf{z})))]

Training Process

Mode Collapse

DCGAN Architecture

DfDCGAN Guidelines (Radford et al., 2015)

The Deep Convolutional GAN established stable architecture:

Replace pooling with strided convolutions (D) and transposed convolutions (G)
Use batch normalization in both G and D (except G output and D input)
Remove fully connected layers
G: ReLU activation (output layer: Tanh)
D: LeakyReLU activation (α=0.2)

Stability tricks: Spectral normalization, progressive growing, two-timescale update rule.

Example: DCGAN Generator

class DCGANGenerator(nn.Module):
    def __init__(self, latent_dim=100, channels=3):
        super().__init__()
        self.gen = nn.Sequential(
            # z: (B, 100, 1, 1) → (B, 512, 4, 4)
            nn.ConvTranspose2d(latent_dim, 512, 4, 1, 0, bias=False),
            nn.BatchNorm2d(512), nn.ReLU(True),
            # → (B, 256, 8, 8)
            nn.ConvTranspose2d(512, 256, 4, 2, 1, bias=False),
            nn.BatchNorm2d(256), nn.ReLU(True),
            # → (B, 128, 16, 16)
            nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),
            nn.BatchNorm2d(128), nn.ReLU(True),
            # → (B, 3, 32, 32)
            nn.ConvTranspose2d(128, channels, 4, 2, 1, bias=False),
            nn.Tanh()
        )

    def forward(self, z):
        return self.gen(z.view(-1, 100, 1, 1))

GAN Variants

WGAN (Wasserstein GAN)

DfWasserstein Distance

The Earth Mover's Distance (Wasserstein-1) replaces JS divergence:

W(p_{\text{data}}, p_g) = \inf_{\gamma \in \Pi(p_{\text{data}}, p_g)} \mathbb{E}_{(\mathbf{x}, \mathbf{y}) \sim \gamma}[\|\mathbf{x} - \mathbf{y}\|]

WGAN loss (using Kantorovich-Rubinstein duality):

\min_G \max_{D \in \mathcal{1}\text{-Lip}} \; \mathbb{E}_{\mathbf{x} \sim p_{\text{data}}}[D(\mathbf{x})] - \mathbb{E}_{\mathbf{z} \sim p_z}[D(G(\mathbf{z}))]

Discriminator (called critic) must be 1-Lipschitz → enforced via weight clipping or spectral normalization.

Advantages: Meaningful loss metric, no mode collapse, stable training.

Key Takeaways

Summary: GANs

GANs consist of Generator vs Discriminator in adversarial training
Objective is a minimax game — Nash equilibrium at D(x) = 0.5
Mode collapse: G produces limited variety — use WGAN, minibatch discrimination
DCGAN established stable convolutional architecture
WGAN uses Wasserstein distance for stable training and meaningful loss
StyleGAN produces photorealistic faces via style injection
Training is unstable — requires careful hyperparameter tuning
GANs being replaced by diffusion models for many tasks
Still useful for style transfer, image editing, super-resolution

What to Learn Next

-> Autoencoders Learn about compressed representations.

-> Variational Autoencoders Generate data with probabilistic models.

-> Diffusion Models Deep Dive Master modern generative AI techniques.

-> Neural Networks Understand the foundation of deep learning.

-> CNNs Learn the convolutional architectures used in GANs.

-> Training Deep Networks Master training techniques for unstable models.

GANs — Generative Adversarial Networks Complete Guide