🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

GANs — Generative Adversarial Networks Complete Guide

Deep LearningGANs🟢 Free Lesson

Advertisement

Deep Learning

Generative Adversarial Networks — AI Creates Art, Faces, and More

Learn how GANs use adversarial training to generate realistic synthetic data from noise.

  • Adversarial training — generator vs. discriminator dynamic
  • Image synthesis — create photorealistic faces and art
  • Training dynamics — Nash equilibrium, mode collapse, and convergence

Creativity is intelligence having fun.

GANs — Generative Adversarial Networks

GANs (Goodfellow et al., 2014) learn to generate data by framing generation as a two-player minimax game between a generator GG and discriminator DD.


GAN Architecture

Noisez ~ p(z)128-512 dimGeneratorG(z)DeConv layersBN + ReLU → TanhG(z)FakeImagegeneratedDiscriminatorD(x)Conv layersLeakyReLU + BN→ SigmoidReal or Fake?RealImagefrom datasetOutputD(x) ∈ [0,1]Real → 1Fake → 0Training DynamicsGenerator:Learns to fool D. Gradient from D→G: "make outputs more realistic"Discriminator:Learns to classify real vs fake. Gradient: "distinguish better"Equilibrium:D(x) = 0.5 ∀x — G generates perfect samples, D cannot distinguish

How GAN training works: The diagram shows the two-player game at the heart of GANs. The Generator (green, left) takes random noise z (a 128-512 dimensional vector sampled from a normal distribution) and transforms it through deconvolution layers into a fake image. The Discriminator (yellow, right) receives both real images from the dataset and fake images from the generator, outputting a probability D(x) ∈ [0,1] — real images should score near 1, fake near 0. The training dynamics box at the bottom explains the adversarial loop: the generator learns from gradients flowing back through the discriminator ("make more realistic images"), while the discriminator learns to distinguish better ("catch the fakes"). At equilibrium, the generator produces perfect samples and the discriminator outputs 0.5 for everything — it can no longer tell real from fake. This minimax game drives both networks to improve simultaneously.


Loss Functions

DfOriginal GAN Loss (Minimax)

The GAN objective is a two-player minimax game:

minGmaxD  V(D,G)=Expdata[logD(x)]+Ezpz[log(1D(G(z)))]\min_G \max_D \; V(D, G) = \mathbb{E}_{\mathbf{x} \sim p_{\text{data}}}[\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \sim p_z}[\log(1 - D(G(\mathbf{z})))]

Optimal discriminator (for fixed G):

DG(x)=pdata(x)pdata(x)+pg(x)D^*_G(\mathbf{x}) = \frac{p_{\text{data}}(\mathbf{x})}{p_{\text{data}}(\mathbf{x}) + p_g(\mathbf{x})}

Global optimum achieved when pg=pdatap_g = p_{\text{data}} and D(x)=12D^*(\mathbf{x}) = \frac{1}{2}.

DfNon-Saturating Loss

In practice, minimize logD(G(z))-\log D(G(\mathbf{z})) instead of log(1D(G(z)))\log(1 - D(G(\mathbf{z}))) for the generator. This provides stronger gradients early in training:

LG=Ezpz[logD(G(z))]\mathcal{L}_G = -\mathbb{E}_{\mathbf{z} \sim p_z}[\log D(G(\mathbf{z}))]
LD=Expdata[logD(x)]Ezpz[log(1D(G(z)))]\mathcal{L}_D = -\mathbb{E}_{\mathbf{x} \sim p_{\text{data}}}[\log D(\mathbf{x})] - \mathbb{E}_{\mathbf{z} \sim p_z}[\log(1 - D(G(\mathbf{z})))]

Training Process

GAN Training LoopStep 1: Train Discriminator (k steps)1. Sample real batch {x₁,...,x_B} ~ pdata2. Sample noise {z₁,...,z_B} ~ pz3. Generate fakes: x̃i = G(zi)4. Maximize: LD = -Σ[log D(xi) + log(1-D(G(zi)))]5. Update θD ← θD - α·∇LDUsually k=1 step (but can be more)Step 2: Train Generator (1 step)1. Sample noise {z₁,...,z_B} ~ pz2. Minimize: LG = -Σ log D(G(zi))3. Update θG ← θG - α·∇LG (through D)D parameters frozen during G updateConvergence:Epoch 1: G generates noise, D easily classifies → LD ≈ 0, LG ≈ ∞Epoch N: G improves, D struggles → LD ≈ log 2, LG ≈ -log 2 → Nash equilibrium at D(x) = 0.5

Mode Collapse

Mode CollapseGood: pg ≈ pdataMode 1Mode 2Bad: Mode CollapseMode 1Mode 2All here!No CollapseMode 1One mode onlyMode collapse:G learns to produce only 1-2 outputs that fool D. Common failure mode.Solutions:WGAN loss, unrolled GAN, minibatch discrimination, spectral normalization

DCGAN Architecture

DfDCGAN Guidelines (Radford et al., 2015)

The Deep Convolutional GAN established stable architecture:

  • Replace pooling with strided convolutions (D) and transposed convolutions (G)
  • Use batch normalization in both G and D (except G output and D input)
  • Remove fully connected layers
  • G: ReLU activation (output layer: Tanh)
  • D: LeakyReLU activation (α=0.2)

Stability tricks: Spectral normalization, progressive growing, two-timescale update rule.

Example: DCGAN Generator

class DCGANGenerator(nn.Module):
    def __init__(self, latent_dim=100, channels=3):
        super().__init__()
        self.gen = nn.Sequential(
            # z: (B, 100, 1, 1) → (B, 512, 4, 4)
            nn.ConvTranspose2d(latent_dim, 512, 4, 1, 0, bias=False),
            nn.BatchNorm2d(512), nn.ReLU(True),
            # → (B, 256, 8, 8)
            nn.ConvTranspose2d(512, 256, 4, 2, 1, bias=False),
            nn.BatchNorm2d(256), nn.ReLU(True),
            # → (B, 128, 16, 16)
            nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),
            nn.BatchNorm2d(128), nn.ReLU(True),
            # → (B, 3, 32, 32)
            nn.ConvTranspose2d(128, channels, 4, 2, 1, bias=False),
            nn.Tanh()
        )

    def forward(self, z):
        return self.gen(z.view(-1, 100, 1, 1))

GAN Variants

GAN Family TreeGAN2014Minimax lossDCGAN2015Conv architectureWGAN2017Wasserstein lossStyleGAN2018Style transferProGAN2017Progressive growcGANConditionalLabel → imagePix2Pix2016Paired translationCycleGAN2017Unpaired translationBigGAN2018Large scale, class-conditionalStyleGAN220201024×1024 facesGANs are being largely replaced by diffusion models (DALL-E 2, Stable Diffusion, Imagen) for image generation, but remain relevant for style transfer and real-time applications.

WGAN (Wasserstein GAN)

DfWasserstein Distance

The Earth Mover's Distance (Wasserstein-1) replaces JS divergence:

W(pdata,pg)=infγΠ(pdata,pg)E(x,y)γ[xy]W(p_{\text{data}}, p_g) = \inf_{\gamma \in \Pi(p_{\text{data}}, p_g)} \mathbb{E}_{(\mathbf{x}, \mathbf{y}) \sim \gamma}[\|\mathbf{x} - \mathbf{y}\|]

WGAN loss (using Kantorovich-Rubinstein duality):

minGmaxD1-Lip  Expdata[D(x)]Ezpz[D(G(z))]\min_G \max_{D \in \mathcal{1}\text{-Lip}} \; \mathbb{E}_{\mathbf{x} \sim p_{\text{data}}}[D(\mathbf{x})] - \mathbb{E}_{\mathbf{z} \sim p_z}[D(G(\mathbf{z}))]

Discriminator (called critic) must be 1-Lipschitz → enforced via weight clipping or spectral normalization.

Advantages: Meaningful loss metric, no mode collapse, stable training.


Key Takeaways

Summary: GANs

  • GANs consist of Generator vs Discriminator in adversarial training
  • Objective is a minimax game — Nash equilibrium at D(x) = 0.5
  • Mode collapse: G produces limited variety — use WGAN, minibatch discrimination
  • DCGAN established stable convolutional architecture
  • WGAN uses Wasserstein distance for stable training and meaningful loss
  • StyleGAN produces photorealistic faces via style injection
  • Training is unstable — requires careful hyperparameter tuning
  • GANs being replaced by diffusion models for many tasks
  • Still useful for style transfer, image editing, super-resolution

What to Learn Next

-> Autoencoders Learn about compressed representations.

-> Variational Autoencoders Generate data with probabilistic models.

-> Diffusion Models Deep Dive Master modern generative AI techniques.

-> Neural Networks Understand the foundation of deep learning.

-> CNNs Learn the convolutional architectures used in GANs.

-> Training Deep Networks Master training techniques for unstable models.

Premium Content

GANs — Generative Adversarial Networks Complete Guide

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement