Unsupervised Learning
Curse of Dimensionality — When More Features Hurt, Not Help
Dimensionality reduction compresses high-dimensional data into fewer dimensions while preserving the most important structure and variance.
- PCA — finds orthogonal axes of maximum variance for fast, linear dimensionality reduction
- t-SNE — preserves local neighborhoods for intuitive 2D and 3D visualization
- UMAP — faster than t-SNE with better global structure preservation
"Not everything that can be counted counts, and not everything that counts can be counted."
Dimensionality Reduction — Complete Guide
Dimensionality reduction compresses high-dimensional data into fewer dimensions while preserving important information.
Why Reduce Dimensions?
Curse of Dimensionality
As the number of features increases, the volume of the space increases exponentially, causing data to become sparse. This makes distance metrics less meaningful and increases the risk of overfitting.
Curse of Dimensionality Visualization
Curse of Dimensionality:
More dimensions = more data needed
Distances become meaningless
Models overfit
Training becomes slow
Benefits:
Faster training
Less overfitting
Better visualization (2D/3D)
Removes noise
Fewer features = simpler model
PCA (Principal Component Analysis)
DfPCA (Principal Component Analysis)
An unsupervised linear transformation technique that finds the directions of maximum variance in high-dimensional data and projects it onto a lower-dimensional space.
PCA Projection Diagram
PCA finds the directions of MAXIMUM VARIANCE:
1. Standardize data
2. Compute covariance matrix
3. Find eigenvectors (principal components)
4. Project data onto top K eigenvectors
PC1: Direction of most variance
PC2: Direction of second most variance (orthogonal to PC1)
...
Explained variance ratio tells you how much info each PC captures:
PC1: 72%
PC2: 15%
PC3: 8%
PC4: 5% -> can probably drop this one
PCA Mathematics
The covariance matrix:
Eigendecomposition:
Explained variance ratio:
Explained Variance
The explained variance ratio tells you how much information each principal component captures. Typically, you want to retain 95% of the total variance.
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_2d = pca.fit_transform(X)
print(f"Explained variance: {pca.explained_variance_ratio_}")
# [0.72, 0.15] — first 2 components explain 87% of variance
t-SNE
Dft-SNE (t-Distributed Stochastic Neighbor Embedding)
A nonlinear technique for visualizing high-dimensional data by preserving local structure (neighborhoods). Best for visualization, not for feature reduction for training.
t-SNE Visualization
t-SNE preserves LOCAL structure (neighborhoods):
Best for: Visualization (2D/3D)
Not for: Feature reduction for training
How it works:
1. Compute similarities in high-D (Gaussian)
2. Compute similarities in low-D (Student-t)
3. Minimize KL divergence between them
Key parameters:
perplexity: Number of neighbors (5-50)
learning_rate: Step size (10-1000)
n_iter: Number of iterations (1000+)
UMAP
DfUMAP (Uniform Manifold Approximation and Projection)
A nonlinear dimensionality reduction technique that is faster than t-SNE and better preserves global structure. Can also be used for clustering.
UMAP vs t-SNE Comparison
UMAP = faster, better version of t-SNE:
Advantages over t-SNE:
10x faster
Better preserves global structure
Can transform new data
Better for clustering
preserves both local and global structure
import umap
reducer = umap.UMAP(n_components=2, n_neighbors=15)
X_2d = reducer.fit_transform(X)
Comparison
| Method | Speed | Local | Global | Transform |
|---|---|---|---|---|
| PCA | Fast | No | Yes | Yes |
| t-SNE | Slow | Yes | No | No |
| UMAP | Medium | Yes | Yes | Yes |
| LDA | Fast | No | No | Yes |
LDA vs PCA
LDA (Linear Discriminant Analysis) is supervised and uses class labels, while PCA is unsupervised and uses only features.
Key Takeaways
Summary: Dimensionality Reduction
- PCA is the standard — fast, interpretable, widely used
- t-SNE is best for visualization — preserves local structure
- UMAP is faster than t-SNE and preserves global structure
- Explained variance tells you how much info PCA preserves
- Standardize data before PCA
- Reduce to 2-3 dimensions for visualization
- Reduce to 10-50 dimensions for model training
- Dimensionality reduction can improve model performance
What to Learn Next
-> Autoencoders Learn the neural network approach to nonlinear dimensionality reduction and representation learning.
-> Clustering Group similar data points using K-Means, DBSCAN, and hierarchical methods.
-> Feature Engineering Create and transform features to improve model performance before dimensionality reduction.
-> Model Evaluation Evaluate whether dimensionality reduction improved or hurt your model's predictive power.
-> Neural Networks Understand the deep learning foundations that autoencoders are built upon.
-> CNNs Apply convolutional architectures to image data where spatial dimensionality matters.