Unsupervised Learning

Curse of Dimensionality — When More Features Hurt, Not Help

Dimensionality reduction compresses high-dimensional data into fewer dimensions while preserving the most important structure and variance.

PCA — finds orthogonal axes of maximum variance for fast, linear dimensionality reduction
t-SNE — preserves local neighborhoods for intuitive 2D and 3D visualization
UMAP — faster than t-SNE with better global structure preservation

"Not everything that can be counted counts, and not everything that counts can be counted."

Dimensionality Reduction — Complete Guide

Dimensionality reduction compresses high-dimensional data into fewer dimensions while preserving important information.

Why Reduce Dimensions?

Curse of Dimensionality

As the number of features increases, the volume of the space increases exponentially, causing data to become sparse. This makes distance metrics less meaningful and increases the risk of overfitting.

Curse of Dimensionality Visualization

Architecture Diagram

Curse of Dimensionality:
  More dimensions = more data needed
  Distances become meaningless
  Models overfit
  Training becomes slow

Benefits:
  Faster training
  Less overfitting
  Better visualization (2D/3D)
  Removes noise
  Fewer features = simpler model

PCA (Principal Component Analysis)

DfPCA (Principal Component Analysis)

An unsupervised linear transformation technique that finds the directions of maximum variance in high-dimensional data and projects it onto a lower-dimensional space.

PCA Projection Diagram

Architecture Diagram

PCA finds the directions of MAXIMUM VARIANCE:

1. Standardize data
2. Compute covariance matrix
3. Find eigenvectors (principal components)
4. Project data onto top K eigenvectors

PC1: Direction of most variance
PC2: Direction of second most variance (orthogonal to PC1)
...

Explained variance ratio tells you how much info each PC captures:
PC1: 72%
PC2: 15%
PC3: 8%
PC4: 5% -> can probably drop this one

PCA Mathematics

The covariance matrix:

\mathbf{C} = \frac{1}{n-1}\mathbf{X}^T\mathbf{X}

Eigendecomposition:

\mathbf{C}\mathbf{v}_i = \lambda_i \mathbf{v}_i

Explained variance ratio:

\text{Explained}_i = \frac{\lambda_i}{\sum_{k=1}^{p} \lambda_k}

Explained Variance

The explained variance ratio tells you how much information each principal component captures. Typically, you want to retain 95% of the total variance.

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_2d = pca.fit_transform(X)

print(f"Explained variance: {pca.explained_variance_ratio_}")
# [0.72, 0.15] — first 2 components explain 87% of variance

t-SNE

Dft-SNE (t-Distributed Stochastic Neighbor Embedding)

A nonlinear technique for visualizing high-dimensional data by preserving local structure (neighborhoods). Best for visualization, not for feature reduction for training.

t-SNE Visualization

Architecture Diagram

t-SNE preserves LOCAL structure (neighborhoods):

Best for: Visualization (2D/3D)
Not for: Feature reduction for training

How it works:
1. Compute similarities in high-D (Gaussian)
2. Compute similarities in low-D (Student-t)
3. Minimize KL divergence between them

Key parameters:
  perplexity: Number of neighbors (5-50)
  learning_rate: Step size (10-1000)
  n_iter: Number of iterations (1000+)

UMAP

DfUMAP (Uniform Manifold Approximation and Projection)

A nonlinear dimensionality reduction technique that is faster than t-SNE and better preserves global structure. Can also be used for clustering.

UMAP vs t-SNE Comparison

Architecture Diagram

UMAP = faster, better version of t-SNE:

Advantages over t-SNE:
  10x faster
  Better preserves global structure
  Can transform new data
  Better for clustering

preserves both local and global structure

import umap

reducer = umap.UMAP(n_components=2, n_neighbors=15)
X_2d = reducer.fit_transform(X)

Comparison

Method	Speed	Local	Global	Transform
PCA	Fast	No	Yes	Yes
t-SNE	Slow	Yes	No	No
UMAP	Medium	Yes	Yes	Yes
LDA	Fast	No	No	Yes

LDA vs PCA

LDA (Linear Discriminant Analysis) is supervised and uses class labels, while PCA is unsupervised and uses only features.

Key Takeaways

Summary: Dimensionality Reduction

PCA is the standard — fast, interpretable, widely used
t-SNE is best for visualization — preserves local structure
UMAP is faster than t-SNE and preserves global structure
Explained variance tells you how much info PCA preserves
Standardize data before PCA
Reduce to 2-3 dimensions for visualization
Reduce to 10-50 dimensions for model training
Dimensionality reduction can improve model performance

What to Learn Next

-> Autoencoders Learn the neural network approach to nonlinear dimensionality reduction and representation learning.

-> Clustering Group similar data points using K-Means, DBSCAN, and hierarchical methods.

-> Feature Engineering Create and transform features to improve model performance before dimensionality reduction.

-> Model Evaluation Evaluate whether dimensionality reduction improved or hurt your model's predictive power.

-> Neural Networks Understand the deep learning foundations that autoencoders are built upon.

-> CNNs Apply convolutional architectures to image data where spatial dimensionality matters.

Dimensionality Reduction — PCA, t-SNE, UMAP Complete Guide

Curse of Dimensionality — When More Features Hurt, Not Help

Dimensionality Reduction — Complete Guide

Why Reduce Dimensions?

Curse of Dimensionality Visualization

PCA (Principal Component Analysis)

DfPCA (Principal Component Analysis)

PCA Projection Diagram

PCA Mathematics

t-SNE

Dft-SNE (t-Distributed Stochastic Neighbor Embedding)

t-SNE Visualization

UMAP

DfUMAP (Uniform Manifold Approximation and Projection)

UMAP vs t-SNE Comparison

Comparison

Key Takeaways

Summary: Dimensionality Reduction

What to Learn Next

Premium Content

Need Expert Machine Learning Help?