🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

K-Nearest Neighbors — Complete Guide

ML FoundationsClassification🟢 Free Lesson

Advertisement

Supervised Learning

Instance-Based Learning — Your Neighbors Tell the Story

KNN classifies new points by looking at the K closest training examples. It is simple, intuitive, and requires no training phase.

  • Lazy Learner — No training phase, all computation at prediction time
  • Distance Metrics — Euclidean, Manhattan, and cosine similarity
  • Curse of Dimensionality — Why KNN struggles with too many features

"Tell me who your neighbors are, and I'll tell you who you are."

K-Nearest Neighbors — Complete Guide

KNN is the simplest ML algorithm — it classifies a point by looking at its K closest neighbors.


How KNN Works

DfK-Nearest Neighbors (KNN)

Given a query point xq\mathbf{x}_q and training set {(x(i),y(i))}i=1N\{(\mathbf{x}^{(i)}, y^{(i)})\}_{i=1}^{N}, KNN finds the KK nearest neighbors NK(xq)\mathcal{N}_K(\mathbf{x}_q) according to a distance metric dd, then predicts:

y^=argmaxciNK1[y(i)=c]\hat{y} = \arg\max_{c} \sum_{i \in \mathcal{N}_K} \mathbb{1}[y^{(i)} = c]

For regression: y^=1KiNKy(i)\hat{y} = \frac{1}{K}\sum_{i \in \mathcal{N}_K} y^{(i)}

KNN Classification: K=5 Neighbors VoteFeature x₁?x_qC₀C₀C₁C₀C₁Class 0 (3 votes)Class 1 (2 votes)Predicted: Class 0KNN AlgorithmStep 1: Store all training data{(x⁽ⁱ⁾, y⁽ⁱ⁾)} for i = 1..NStep 2: For query point x_q:a. Compute d(x_q, x⁽ⁱ⁾) for all ib. Sort by distancec. Select K nearest: N_K(x_q)Step 3: Majority voteŷ = argmax_c Σ 𝟙[y⁽ⁱ⁾=c]Complexity:O(Nd) per prediction — no training!

Distance Metrics

Euclidean Distance (L2)

d(x,y)=i=1n(xiyi)2=xy2d(\mathbf{x}, \mathbf{y}) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2} = \|\mathbf{x} - \mathbf{y}\|_2

Here,

  • x,y\mathbf{x}, \mathbf{y}=Two data points in ℝᵈ

Manhattan Distance (L1)

d(x,y)=i=1nxiyi=xy1d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^{n}|x_i - y_i| = \|\mathbf{x} - \mathbf{y}\|_1

Here,

  • 1\|\cdot\|_1=L1 norm (city block distance)

Minkowski Distance (General)

d(x,y)=(i=1nxiyip)1/pd(\mathbf{x}, \mathbf{y}) = \left(\sum_{i=1}^{n}|x_i - y_i|^p\right)^{1/p}

Here,

  • pp=1=Manhattan, 2=Euclidean, ∞=Chebyshev
Distance Metrics ComparisonManhattan (L1)d = |80−220| + |180−85| = 240Grid-aligned paths onlyEuclidean (L2)d = √(140² + 95²) = 169.3Straight-line distanceCosine Similarityθcos(θ) = x·y / (‖x‖‖y‖)Direction only, not magnitude

Choosing K

Choosing K

  • K=1K = 1: Very flexible, noisy, overfits (low bias, high variance)
  • K=NK = \sqrt{N}: Common heuristic (where NN = training set size)
  • K=NK = N: Always predicts majority class (high bias, low variance)

Rules of thumb:

  • Use odd K for binary classification (avoid ties)
  • Use cross-validation to find optimal K
  • Larger K → smoother decision boundary → less overfitting
Effect of K on Decision BoundaryK=1 (Overfitting)Complex, jagged boundaryK=5 (Good Fit)Smooth, reasonable boundaryK=50 (Underfitting)Straight line, too simple

Weighted KNN

DfWeighted KNN

Closer neighbors get more influence: y^=argmaxciNKwi1[y(i)=c]\hat{y} = \arg\max_c \sum_{i \in \mathcal{N}_K} w_i \cdot \mathbb{1}[y^{(i)} = c] where wi=1d(xq,x(i))w_i = \frac{1}{d(\mathbf{x}_q, \mathbf{x}^{(i)})} or wi=ed(xq,x(i))w_i = e^{-d(\mathbf{x}_q, \mathbf{x}^{(i)})}.

Example: Weighted vs Standard KNN

Standard KNN (K=2): Neighbor 1 at distance 0.5 (Class A), Neighbor 2 at distance 2.0 (Class B) → Tie: A=1, B=1

Weighted KNN: A gets weight 10.5=2.0\frac{1}{0.5} = 2.0, B gets weight 12.0=0.5\frac{1}{2.0} = 0.5A wins (2.0 vs 0.5)


Curse of Dimensionality

DfCurse of Dimensionality

As dimension dd increases, the volume of the space grows exponentially as VrdV \propto r^d. To maintain the same data density, you need NrdN \propto r^d samples — exponentially more data. In high dimensions, all points become approximately equidistant, making distance metrics meaningless.

# Demonstration: distances converge in high dimensions
import numpy as np
for d in [2, 5, 10, 50, 100, 500]:
    pts = np.random.rand(100, d)
    dists = np.sqrt(((pts[:,None] - pts[None,:])**2).sum(2))
    np.fill_diagonal(dists, np.inf)
    ratio = dists.max(axis=1).mean() / dists.min(axis=1).mean()
    print(f"d={d:3d}: d_max/d_min = {ratio:.2f}")
# Output: d_max/d_min → 1 as d → ∞

Solutions for High Dimensions

  1. Dimensionality reduction (PCA, t-SNE) before applying KNN
  2. Feature selection — remove irrelevant features
  3. Use tree-based methods (Random Forest, KD-trees) that don't rely on distances
  4. Increase training data exponentially with dimensions

Key Takeaways

Summary: KNN

  1. KNN is a lazy learner — no training, O(Nd)O(Nd) prediction time
  2. Scale your features — KNN is distance-based, features must be comparable
  3. Choose K with cross-validation (typically 3-15); use odd K for binary
  4. Weighted KNN often outperforms uniform voting
  5. KNN suffers from the curse of dimensionality — distances become meaningless in high-dd
  6. KD-tree or Ball tree accelerate nearest neighbor search from O(N)O(N) to O(logN)O(\log N)
  7. KNN is a great baseline and works well for small, low-dimensional datasets
  8. No model is stored — all computation happens at prediction time

What to Learn Next

-> Decision Trees If-then rules that learn — the most interpretable algorithm.

-> Clustering Grouping the ungrouped — finding hidden structure in data.

-> Dimensionality Reduction Reduce features while preserving information with PCA and t-SNE.

Premium Content

K-Nearest Neighbors — Complete Guide

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement