πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Dimensionality Reduction: t-SNE, UMAP & Autoencoders

Machine LearningDimensionality Reduction⭐ Premium

Advertisement

NVIDIA & Meta Interview

Dimensionality Reduction: t-SNE, UMAP & Autoencoders

Visualizing and compressing high-dimensional data

Interview Question

"Explain the differences between t-SNE, UMAP, and PCA for dimensionality reduction. When would you use each method? How do autoencoders compare to these methods?"

Difficulty: Hard | Frequently asked at NVIDIA, Meta, Google


Theoretical Foundation

PCA Recap (Linear)

PCA finds orthogonal directions of maximum variance. Fast and deterministic, but only captures linear relationships.

t-SNE (t-Distributed Stochastic Neighbor Embedding)

Goal: Preserve local structure (nearby points stay nearby).

Algorithm:

  1. Compute pairwise similarities in high-dimensional space:
pj∣i=exp⁑(βˆ’βˆ₯xiβˆ’xjβˆ₯2/2Οƒi2)βˆ‘kβ‰ iexp⁑(βˆ’βˆ₯xiβˆ’xkβˆ₯2/2Οƒi2)p_{j|i} = \frac{\exp(-\|x_i - x_j\|^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-\|x_i - x_k\|^2 / 2\sigma_i^2)}
  1. Compute similarities in low-dimensional space using t-distribution:
qij=(1+βˆ₯yiβˆ’yjβˆ₯2)βˆ’1βˆ‘kβ‰ l(1+βˆ₯ykβˆ’ylβˆ₯2)βˆ’1q_{ij} = \frac{(1 + \|y_i - y_j\|^2)^{-1}}{\sum_{k \neq l} (1 + \|y_k - y_l\|^2)^{-1}}
  1. Minimize KL divergence:
KL(Pβˆ₯Q)=βˆ‘iβ‰ jpijlog⁑pijqijKL(P \| Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}

Properties:

  • Preserves local structure well
  • Non-parametric (no explicit mapping)
  • Slow for large datasets (O(n2)O(n^2))
  • Non-deterministic (different runs give different results)
  • Good for visualization (2-3D)

UMAP (Uniform Manifold Approximation and Projection)

Goal: Preserve both local and global structure.

Algorithm:

  1. Build fuzzy simplicial set (neighborhood graph) in high-D
  2. Optimize low-D embedding to match the fuzzy set

Properties:

  • Faster than t-SNE (O(n1.14)O(n^{1.14}))
  • Better preserves global structure
  • Can be used for general dimensionality reduction
  • Has a parametric version
  • Deterministic with fixed random seed

t-SNE vs UMAP Comparison

Aspectt-SNEUMAP
SpeedSlowFast
Global StructurePoorGood
ScalabilityLimitedBetter
DeterministicNoYes (with seed)
Out-of-sampleNoYes (parametric)
Use CaseVisualization onlyVisualization + reduction

ℹ️

Key Insight: UMAP is generally preferred over t-SNE because it's faster, preserves global structure better, and can handle out-of-sample points. t-SNE is still useful for quick visualization.

Autoencoders

Architecture:

  • Encoder: Maps input to latent space z=f(x)z = f(x)
  • Decoder: Reconstructs input from latent space x^=g(z)\hat{x} = g(z)

Training:

min⁑βˆ₯xβˆ’g(f(x))βˆ₯2\min \|x - g(f(x))\|^2

Variants:

  • Variational Autoencoder (VAE): Adds probabilistic structure to latent space
  • Denoising Autoencoder: Trained to reconstruct from corrupted input
  • Sparse Autoencoder: Enforces sparsity in latent representation

Properties:

  • Learns non-linear mappings
  • Can generate new data (VAE)
  • Requires more data and tuning
  • Computationally expensive

Method Selection Guide

MethodSpeedNon-linearGenerationOut-of-sampleBest For
PCAFastNoNoYesLinear data, feature reduction
t-SNESlowYesNoNo2D/3D visualization
UMAPMediumYesNoYesVisualization + reduction
AutoencoderSlowYesYesYesComplex non-linear patterns

Code Implementation


Real-World Applications

NVIDIA: Feature Visualization

  • CNN Feature Maps: Visualizing learned representations
  • Latent Space Analysis: Understanding generative models
  • GPU-Accelerated Reduction: CUDA implementations of t-SNE/UMAP

Meta: Content Understanding

  • Embedding Visualization: Visualizing user/item embeddings
  • Topic Modeling: Reducing document dimensions
  • Anomaly Detection: Identifying outliers in embedding space

πŸ’‘

NVIDIA Interview Tip: Discuss GPU acceleration of t-SNE and UMAP. Mention libraries like FIt-SNE and cuML that provide GPU-accelerated implementations.


Common Follow-Up Questions

Q1: Why is t-SNE not suitable for high-dimensional data directly? t-SNE is slow (O(n2)O(n^2)) and struggles with high dimensions. Apply PCA first to reduce to ~50 dimensions, then apply t-SNE.

Q2: How does UMAP preserve global structure better than t-SNE? UMAP uses fuzzy topological representations and optimizes a cross-entropy loss, while t-SNE uses KL divergence which emphasizes local structure.

Q3: When would you use an autoencoder over t-SNE/UMAP? When you need: (1) a parametric mapping for out-of-sample points, (2) data generation capabilities (VAE), or (3) complex non-linear patterns.

Q4: How do you evaluate dimensionality reduction quality?

  • Trustworthiness: Are neighbors preserved?
  • Continuity: Is the mapping smooth?
  • Downstream task performance: Does reduction help classification/clustering?

Related Topics

Advertisement