🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

GANs: Generator, Discriminator, Training Stability — Asked at NVIDIA & Meta

Deep Learning Premium InterviewsGenerative Adversarial Networks⭐ Premium

Advertisement

NVIDIA & Meta

GANs: Generator, Discriminator & Training Stability

Premium Interview Preparation — Generative Model Mastery

🎯 The Interview Question

"Explain the GAN training objective mathematically. What is the minimax game between generator and discriminator? Why is GAN training unstable, and what techniques exist to stabilize it? Describe the mode collapse problem and how modern GAN architectures address it."

This question tests understanding of generative models — critical for NVIDIA (image generation) and Meta (content creation).


📚 Detailed Answer

GAN: The Minimax Game

The GAN framework consists of:

  • Generator GG: Maps noise zpz\mathbf{z} \sim p_z to fake samples
  • Discriminator DD: Classifies real vs fake samples

Objective (minimax game):

minGmaxDL(G,D)=Expdata[logD(x)]+Ezpz[log(1D(G(z)))]\min_G \max_D \mathcal{L}(G, D) = \mathbb{E}_{\mathbf{x} \sim p_{data}}[\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \sim p_z}[\log(1 - D(G(\mathbf{z})))]

Discriminator maximizes: Correctly classify real as real and fake as fake.

Generator minimizes: Fool discriminator into classifying fake as real.

Training Dynamics

At the optimal discriminator:

DG(x)=pdata(x)pdata(x)+pg(x)D^*_G(\mathbf{x}) = \frac{p_{data}(\mathbf{x})}{p_{data}(\mathbf{x}) + p_g(\mathbf{x})}

The global optimum is pg=pdatap_g = p_{data}, meaning the generator perfectly matches the data distribution.

Training algorithm:

Architecture Diagram
for each training step:
    # Train discriminator
    Sample real batch {x_i}
    Sample noise batch {z_i}
    Generate fake batch: G(z_i)
    Maximize: log D(x_i) + log(1 - D(G(z_i)))

    # Train generator
    Sample noise batch {z_i}
    Minimize: log(1 - D(G(z_i)))
    # OR equivalently maximize: log D(G(z_i))

⚠️

In practice, the generator loss should be minimized by maximizing logD(G(z))\log D(G(\mathbf{z})) instead of minimizing log(1D(G(z)))\log(1 - D(G(\mathbf{z}))). The latter saturates when discriminator is strong.

Training Instability

Vanishing Gradients

When discriminator is too good, D(G(z))0D(G(\mathbf{z})) \approx 0, and gradients vanish:

Glog(1D(G(z)))0\nabla_G \log(1 - D(G(\mathbf{z}))) \approx 0

Solution: Train discriminator to near-optimal but not perfect.

Mode Collapse

Generator produces limited variety of outputs, ignoring parts of data distribution:

pg(x)pdata(x) (missing modes)p_g(\mathbf{x}) \neq p_{data}(\mathbf{x}) \text{ (missing modes)}

Symptoms:

  • Generated samples look similar
  • Low diversity despite good individual quality
  • Discriminator loss oscillates

Training Oscillation

Generator and discriminator compete, causing loss oscillations without convergence.

Stabilization Techniques

1. Wasserstein GAN (WGAN)

Replaces Jensen-Shannon divergence with Wasserstein distance:

LW=Expdata[D(x)]Ezpz[D(G(z))]\mathcal{L}_W = \mathbb{E}_{\mathbf{x} \sim p_{data}}[D(\mathbf{x})] - \mathbb{E}_{\mathbf{z} \sim p_z}[D(G(\mathbf{z}))]

with Lipschitz constraint: DL1\|D\|_L \leq 1

Gradient penalty:

LGP=λEx^px^[(x^D(x^)21)2]\mathcal{L}_{GP} = \lambda \mathbb{E}_{\hat{\mathbf{x}} \sim p_{\hat{x}}}\left[(\|\nabla_{\hat{\mathbf{x}}} D(\hat{\mathbf{x}})\|_2 - 1)^2\right]

2. Progressive Growing

Train with low resolution initially, progressively increase:

Architecture Diagram
4×4 → 8×8 → 16×16 → 32×32 → ... → 1024×1024

Each resolution trained for a period, then fade in new layers.

3. Style-Based Generator (StyleGAN)

  • Mapping network transforms latent to style vector
  • AdaIN (Adaptive Instance Normalization) injects style at each layer
  • Enables control over different aspects (pose, hair, background)

Modern GAN Architectures

StyleGAN2/3

  • Path length regularization: Encourages smooth latent space
  • Skip connections in mapping network: Better style mixing
  • Lazy regularization: Apply regularization every 16 steps

BigGAN

  • Class-conditional generation: Control output class
  • Truncation trick: Trade diversity for quality
  • Large-scale training: Batch size 2048, 256 GPUs

VQGAN

  • Vector quantization: Discrete latent space
  • Transformer decoder: Model spatial relationships
  • Perceptual loss: Better visual quality

Evaluation Metrics

MetricWhat it MeasuresHow to Compute
FIDQuality + diversityDistance between feature distributions
ISQuality onlyEntropy of classifier predictions
LPIPSPerceptual similarityFeature distance in VGG space
Precision/RecallQuality vs diversityVolume of learned manifold

Follow-Up Questions

Q: How does StyleGAN achieve style mixing? A: By mixing latent codes at different layers of the generator, you can control different aspects: coarse styles (pose) from early layers, fine styles (color) from later layers.

Q: What is the relationship between GANs and VAEs? A: Both are generative models but with different objectives. GANs minimize divergence between distributions; VAEs maximize variational lower bound. GANs produce sharper images; VAEs provide better coverage.

Q: Can GANs generate text? A: Difficult due to discrete nature of text. Most text generation uses autoregressive models (GPT) or diffusion models. GANSynth exists for music.

Related Topics

Advertisement