🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Dimensionality Reduction — PCA, t-SNE, UMAP Complete Guide

Core MLDimensionality Reduction🟢 Free Lesson

Advertisement

Unsupervised Learning

Curse of Dimensionality — When More Features Hurt, Not Help

Dimensionality reduction compresses high-dimensional data into fewer dimensions while preserving the most important structure and variance.

  • PCA — finds orthogonal axes of maximum variance for fast, linear dimensionality reduction
  • t-SNE — preserves local neighborhoods for intuitive 2D and 3D visualization
  • UMAP — faster than t-SNE with better global structure preservation

"Not everything that can be counted counts, and not everything that counts can be counted."

Dimensionality Reduction — Complete Guide

Dimensionality reduction compresses high-dimensional data into fewer dimensions while preserving important information.


Why Reduce Dimensions?

Curse of Dimensionality

As the number of features increases, the volume of the space increases exponentially, causing data to become sparse. This makes distance metrics less meaningful and increases the risk of overfitting.

Curse of Dimensionality Visualization

Curse of Dimensionality — Data Sparsity2D SpaceN=10 fills space5D SpaceSame N, sparser10D SpaceSame N, very sparseVolume grows as O(d^d) — distance metrics become meaningless
Architecture Diagram
Curse of Dimensionality:
  More dimensions = more data needed
  Distances become meaningless
  Models overfit
  Training becomes slow

Benefits:
  Faster training
  Less overfitting
  Better visualization (2D/3D)
  Removes noise
  Fewer features = simpler model

PCA (Principal Component Analysis)

DfPCA (Principal Component Analysis)

An unsupervised linear transformation technique that finds the directions of maximum variance in high-dimensional data and projects it onto a lower-dimensional space.

PCA Projection Diagram

PCA: Finding Principal ComponentsOriginal 2D DataPC1 (max variance)PC2Projected onto PC11D representation preservingmaximum varianceExplained variance: PC1=72%, PC2=15%
Architecture Diagram
PCA finds the directions of MAXIMUM VARIANCE:

1. Standardize data
2. Compute covariance matrix
3. Find eigenvectors (principal components)
4. Project data onto top K eigenvectors

PC1: Direction of most variance
PC2: Direction of second most variance (orthogonal to PC1)
...

Explained variance ratio tells you how much info each PC captures:
PC1: 72%
PC2: 15%
PC3: 8%
PC4: 5% -> can probably drop this one

PCA Mathematics

The covariance matrix:

C=1n1XTX\mathbf{C} = \frac{1}{n-1}\mathbf{X}^T\mathbf{X}

Eigendecomposition:

Cvi=λivi\mathbf{C}\mathbf{v}_i = \lambda_i \mathbf{v}_i

Explained variance ratio:

Explainedi=λik=1pλk\text{Explained}_i = \frac{\lambda_i}{\sum_{k=1}^{p} \lambda_k}

Explained Variance

The explained variance ratio tells you how much information each principal component captures. Typically, you want to retain 95% of the total variance.

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_2d = pca.fit_transform(X)

print(f"Explained variance: {pca.explained_variance_ratio_}")
# [0.72, 0.15] — first 2 components explain 87% of variance

t-SNE

Dft-SNE (t-Distributed Stochastic Neighbor Embedding)

A nonlinear technique for visualizing high-dimensional data by preserving local structure (neighborhoods). Best for visualization, not for feature reduction for training.

t-SNE Visualization

t-SNE: Preserving Local NeighborhoodsHigh-Dimensional SpaceSimilar points are close in high-D2D EmbeddingClusters preserved in 2D
Architecture Diagram
t-SNE preserves LOCAL structure (neighborhoods):

Best for: Visualization (2D/3D)
Not for: Feature reduction for training

How it works:
1. Compute similarities in high-D (Gaussian)
2. Compute similarities in low-D (Student-t)
3. Minimize KL divergence between them

Key parameters:
  perplexity: Number of neighbors (5-50)
  learning_rate: Step size (10-1000)
  n_iter: Number of iterations (1000+)

UMAP

DfUMAP (Uniform Manifold Approximation and Projection)

A nonlinear dimensionality reduction technique that is faster than t-SNE and better preserves global structure. Can also be used for clustering.

UMAP vs t-SNE Comparison

UMAP vs t-SNE: Key Differencest-SNE• Preserves local structure only• Cannot transform new data• O(n²) complexity• Non-parametric• Good for visualization only• Cluster sizes may distortUMAP• Preserves local AND global• Can transform new data• O(n) complexity (faster)• Parametric variant available• Good for visualization + ML• Better cluster preservation
Architecture Diagram
UMAP = faster, better version of t-SNE:

Advantages over t-SNE:
  10x faster
  Better preserves global structure
  Can transform new data
  Better for clustering

preserves both local and global structure
import umap

reducer = umap.UMAP(n_components=2, n_neighbors=15)
X_2d = reducer.fit_transform(X)

Comparison

MethodSpeedLocalGlobalTransform
PCAFastNoYesYes
t-SNESlowYesNoNo
UMAPMediumYesYesYes
LDAFastNoNoYes

LDA vs PCA

LDA (Linear Discriminant Analysis) is supervised and uses class labels, while PCA is unsupervised and uses only features.


Key Takeaways

Summary: Dimensionality Reduction

  1. PCA is the standard — fast, interpretable, widely used
  2. t-SNE is best for visualization — preserves local structure
  3. UMAP is faster than t-SNE and preserves global structure
  4. Explained variance tells you how much info PCA preserves
  5. Standardize data before PCA
  6. Reduce to 2-3 dimensions for visualization
  7. Reduce to 10-50 dimensions for model training
  8. Dimensionality reduction can improve model performance

What to Learn Next

-> Autoencoders Learn the neural network approach to nonlinear dimensionality reduction and representation learning.

-> Clustering Group similar data points using K-Means, DBSCAN, and hierarchical methods.

-> Feature Engineering Create and transform features to improve model performance before dimensionality reduction.

-> Model Evaluation Evaluate whether dimensionality reduction improved or hurt your model's predictive power.

-> Neural Networks Understand the deep learning foundations that autoencoders are built upon.

-> CNNs Apply convolutional architectures to image data where spatial dimensionality matters.

Premium Content

Dimensionality Reduction — PCA, t-SNE, UMAP Complete Guide

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement