Instance-Based Learning — Your Neighbors Tell the Story
KNN classifies new points by looking at the K closest training examples. It is simple, intuitive, and requires no training phase.
- Lazy Learner — No training phase, all computation at prediction time
- Distance Metrics — Euclidean, Manhattan, and cosine similarity
- Curse of Dimensionality — Why KNN struggles with too many features
"Tell me who your neighbors are, and I'll tell you who you are."
K-Nearest Neighbors — Complete Guide
KNN is the simplest ML algorithm — it classifies a point by looking at its K closest neighbors.
How KNN Works
DfK-Nearest Neighbors (KNN)
Given a query point and training set , KNN finds the nearest neighbors according to a distance metric , then predicts:
For regression:
Distance Metrics
Euclidean Distance (L2)
Here,
- =Two data points in ℝᵈ
Manhattan Distance (L1)
Here,
- =L1 norm (city block distance)
Minkowski Distance (General)
Here,
- =1=Manhattan, 2=Euclidean, ∞=Chebyshev
Choosing K
Choosing K
- : Very flexible, noisy, overfits (low bias, high variance)
- : Common heuristic (where = training set size)
- : Always predicts majority class (high bias, low variance)
Rules of thumb:
- Use odd K for binary classification (avoid ties)
- Use cross-validation to find optimal K
- Larger K → smoother decision boundary → less overfitting
Weighted KNN
DfWeighted KNN
Closer neighbors get more influence: where or .
Example: Weighted vs Standard KNN
Standard KNN (K=2): Neighbor 1 at distance 0.5 (Class A), Neighbor 2 at distance 2.0 (Class B) → Tie: A=1, B=1
Weighted KNN: A gets weight , B gets weight → A wins (2.0 vs 0.5)
Curse of Dimensionality
DfCurse of Dimensionality
As dimension increases, the volume of the space grows exponentially as . To maintain the same data density, you need samples — exponentially more data. In high dimensions, all points become approximately equidistant, making distance metrics meaningless.
# Demonstration: distances converge in high dimensions
import numpy as np
for d in [2, 5, 10, 50, 100, 500]:
pts = np.random.rand(100, d)
dists = np.sqrt(((pts[:,None] - pts[None,:])**2).sum(2))
np.fill_diagonal(dists, np.inf)
ratio = dists.max(axis=1).mean() / dists.min(axis=1).mean()
print(f"d={d:3d}: d_max/d_min = {ratio:.2f}")
# Output: d_max/d_min → 1 as d → ∞
Solutions for High Dimensions
- Dimensionality reduction (PCA, t-SNE) before applying KNN
- Feature selection — remove irrelevant features
- Use tree-based methods (Random Forest, KD-trees) that don't rely on distances
- Increase training data exponentially with dimensions
Key Takeaways
Summary: KNN
- KNN is a lazy learner — no training, prediction time
- Scale your features — KNN is distance-based, features must be comparable
- Choose K with cross-validation (typically 3-15); use odd K for binary
- Weighted KNN often outperforms uniform voting
- KNN suffers from the curse of dimensionality — distances become meaningless in high-
- KD-tree or Ball tree accelerate nearest neighbor search from to
- KNN is a great baseline and works well for small, low-dimensional datasets
- No model is stored — all computation happens at prediction time
What to Learn Next
-> Decision Trees If-then rules that learn — the most interpretable algorithm.
-> Clustering Grouping the ungrouped — finding hidden structure in data.
-> Dimensionality Reduction Reduce features while preserving information with PCA and t-SNE.