Supervised Learning

Instance-Based Learning — Your Neighbors Tell the Story

KNN classifies new points by looking at the K closest training examples. It is simple, intuitive, and requires no training phase.

Lazy Learner — No training phase, all computation at prediction time
Distance Metrics — Euclidean, Manhattan, and cosine similarity
Curse of Dimensionality — Why KNN struggles with too many features

"Tell me who your neighbors are, and I'll tell you who you are."

K-Nearest Neighbors — Complete Guide

KNN is the simplest ML algorithm — it classifies a point by looking at its K closest neighbors.

How KNN Works

DfK-Nearest Neighbors (KNN)

Given a query point $\mathbf{x}_q$ and training set $\{(\mathbf{x}^{(i)}, y^{(i)})\}_{i=1}^{N}$ , KNN finds the $K$ nearest neighbors $\mathcal{N}_K(\mathbf{x}_q)$ according to a distance metric $d$ , then predicts:

\hat{y} = \arg\max_{c} \sum_{i \in \mathcal{N}_K} \mathbb{1}[y^{(i)} = c]

For regression: $\hat{y} = \frac{1}{K}\sum_{i \in \mathcal{N}_K} y^{(i)}$

Distance Metrics

Euclidean Distance (L2)

d(\mathbf{x}, \mathbf{y}) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2} = \|\mathbf{x} - \mathbf{y}\|_2

Here,

$\mathbf{x}, \mathbf{y}$ =Two data points in ℝᵈ

Manhattan Distance (L1)

d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^{n}|x_i - y_i| = \|\mathbf{x} - \mathbf{y}\|_1

Here,

$\|\cdot\|_1$ =L1 norm (city block distance)

Minkowski Distance (General)

d(\mathbf{x}, \mathbf{y}) = \left(\sum_{i=1}^{n}|x_i - y_i|^p\right)^{1/p}

Here,

$p$ =1=Manhattan, 2=Euclidean, ∞=Chebyshev

Choosing K

$K = 1$ : Very flexible, noisy, overfits (low bias, high variance)
$K = \sqrt{N}$ : Common heuristic (where $N$ = training set size)
$K = N$ : Always predicts majority class (high bias, low variance)

Rules of thumb:

Use odd K for binary classification (avoid ties)
Use cross-validation to find optimal K
Larger K → smoother decision boundary → less overfitting

Weighted KNN

DfWeighted KNN

Closer neighbors get more influence: $\hat{y} = \arg\max_c \sum_{i \in \mathcal{N}_K} w_i \cdot \mathbb{1}[y^{(i)} = c]$ where $w_i = \frac{1}{d(\mathbf{x}_q, \mathbf{x}^{(i)})}$ or $w_i = e^{-d(\mathbf{x}_q, \mathbf{x}^{(i)})}$ .

Example: Weighted vs Standard KNN

Standard KNN (K=2): Neighbor 1 at distance 0.5 (Class A), Neighbor 2 at distance 2.0 (Class B) → Tie: A=1, B=1

Weighted KNN: A gets weight $\frac{1}{0.5} = 2.0$ , B gets weight $\frac{1}{2.0} = 0.5$ → A wins (2.0 vs 0.5)

Curse of Dimensionality

DfCurse of Dimensionality

As dimension $d$ increases, the volume of the space grows exponentially as $V \propto r^d$ . To maintain the same data density, you need $N \propto r^d$ samples — exponentially more data. In high dimensions, all points become approximately equidistant, making distance metrics meaningless.

# Demonstration: distances converge in high dimensions
import numpy as np
for d in [2, 5, 10, 50, 100, 500]:
    pts = np.random.rand(100, d)
    dists = np.sqrt(((pts[:,None] - pts[None,:])**2).sum(2))
    np.fill_diagonal(dists, np.inf)
    ratio = dists.max(axis=1).mean() / dists.min(axis=1).mean()
    print(f"d={d:3d}: d_max/d_min = {ratio:.2f}")
# Output: d_max/d_min → 1 as d → ∞

Solutions for High Dimensions

Dimensionality reduction (PCA, t-SNE) before applying KNN
Feature selection — remove irrelevant features
Use tree-based methods (Random Forest, KD-trees) that don't rely on distances
Increase training data exponentially with dimensions

Key Takeaways

Summary: KNN

KNN is a lazy learner — no training, $O(Nd)$ prediction time
Scale your features — KNN is distance-based, features must be comparable
Choose K with cross-validation (typically 3-15); use odd K for binary
Weighted KNN often outperforms uniform voting
KNN suffers from the curse of dimensionality — distances become meaningless in high- $d$
KD-tree or Ball tree accelerate nearest neighbor search from $O(N)$ to $O(\log N)$
KNN is a great baseline and works well for small, low-dimensional datasets
No model is stored — all computation happens at prediction time

What to Learn Next

-> Decision Trees If-then rules that learn — the most interpretable algorithm.

-> Clustering Grouping the ungrouped — finding hidden structure in data.

-> Dimensionality Reduction Reduce features while preserving information with PCA and t-SNE.

K-Nearest Neighbors — Complete Guide

Instance-Based Learning — Your Neighbors Tell the Story

K-Nearest Neighbors — Complete Guide

How KNN Works

DfK-Nearest Neighbors (KNN)

Distance Metrics

Euclidean Distance (L2)

Manhattan Distance (L1)

Minkowski Distance (General)

Choosing K

Weighted KNN

DfWeighted KNN

Example: Weighted vs Standard KNN

Curse of Dimensionality

DfCurse of Dimensionality

Key Takeaways

Summary: KNN

What to Learn Next

Premium Content

Need Expert Machine Learning Help?