Amazon & Stripe Interview

K-Nearest Neighbors: Distance Metrics & Curse of Dimensionality

The simplest yet most insightful instance-based learning algorithm

Interview Question

"Explain the KNN algorithm and its decision boundary. What is the curse of dimensionality and how does it affect KNN? Compare different distance metrics and their use cases."

Difficulty: Medium | Frequently asked at Amazon, Stripe, Google

Theoretical Foundation

KNN Algorithm

K-Nearest Neighbors is a lazy learning algorithm that makes predictions based on the majority class (classification) or average value (regression) of the $K$ closest training examples.

Algorithm:

Choose the number of neighbors $K$
Calculate the distance between the query point and all training points
Select the $K$ nearest neighbors
For classification: Return the majority class
For regression: Return the average value

Key Properties:

Instance-based: No explicit training phase
Non-parametric: No assumptions about data distribution
Lazy learning: All computation happens at prediction time
Local decision: Decisions based on local neighborhoods

Decision Boundary

The decision boundary of KNN is piecewise linear and depends on:

K value: Larger K → smoother boundary
Distance metric: Different metrics → different boundaries
Feature scaling: Essential for meaningful distances

Choosing K

Small K (e.g., K=1): Low bias, high variance (overfitting)
Large K (e.g., K=n): High bias, low variance (underfitting)
Optimal K: Typically $\sqrt{n}$ or determined by cross-validation

ℹ️

Key Insight: KNN is a "voting" classifier. With K=1, it memorizes the training data (overfitting). With K=n, it always predicts the majority class (underfitting). The optimal K balances bias and variance.

Distance Metrics

1. Euclidean Distance (L2)

d(x, y) = \sqrt{\sum_{i=1}^{p} (x_i - y_i)^2}

Most commonly used
Sensitive to outliers
Assumes features are on the same scale
Best for continuous features

2. Manhattan Distance (L1)

d(x, y) = \sum_{i=1}^{p} |x_i - y_i|

More robust to outliers than Euclidean
Used when features are sparse
Best for high-dimensional data

3. Minkowski Distance (Generalized)

d(x, y) = \left(\sum_{i=1}^{p} |x_i - y_i|^q\right)^{1/q}

$q=1$ : Manhattan distance
$q=2$ : Euclidean distance
$q \to \infty$ : Chebyshev distance

4. Cosine Similarity

\cos(\theta) = \frac{x \cdot y}{\|x\| \|y\|}

Measures angle between vectors
Ignores magnitude, only direction
Best for text classification (TF-IDF)
Range: $[-1, 1]$ (1 = identical direction)

5. Hamming Distance

d(x, y) = \frac{1}{p} \sum_{i=1}^{p} \mathbb{1}[x_i \neq y_i]

For categorical features
Measures proportion of differing features
Best for binary/categorical data

Distance Metric Comparison

Metric	Range	Sensitive to Outliers	Best For	Time Complexity
Euclidean	$[0, \infty)$	Yes	Continuous, low-dim	$O(p)$
Manhattan	$[0, \infty)$	Less	Sparse, high-dim	$O(p)$
Cosine	$[-1, 1]$	No	Text, high-dim	$O(p)$
Hamming	$[0, 1]$	No	Categorical	$O(p)$

Curse of Dimensionality

As the number of features increases, several problems arise:

Distance Concentration: All points become equidistant

\frac{d_{\max} - d_{\min}}{d_{\min}} \to 0 \text{ as } p \to \infty

Data Sparsity: Points become increasingly isolated
- Required data grows exponentially with dimensions
- Volume of unit hypersphere decreases relative to hypercube
Spurious Correlations: In high dimensions, random correlations appear

⚠️

Critical Concept: In high dimensions, the concept of "nearest neighbor" becomes meaningless because all points are approximately equidistant. This is why KNN performs poorly without dimensionality reduction.

Mathematical Explanation:

For $p$ -dimensional uniform data in $[0,1]^p$ :

Expected distance between two random points: $\sqrt{p/6}$
Ratio of max to min distance: $\frac{\sqrt{p/3} - \sqrt{p/6}}{\sqrt{p/6}} = \sqrt{2} - 1 \approx 0.414$

This ratio doesn't depend on $n$ , meaning even with infinite data, the curse persists.

Optimizations for KNN

1. KD-Tree

Binary tree partitioning by feature values
Query time: $O(\log n)$ average, $O(n)$ worst case
Best for low to medium dimensions ( $p < 20$ )

2. Ball Tree

Hierarchical clustering of data points
Better for high dimensions than KD-Tree
Handles non-Euclidean distances

3. Approximate Nearest Neighbors (ANN)

Locality-Sensitive Hashing (LSH)
Annoy (Approximate Nearest Neighbors Oh Yeah)
HNSW (Hierarchical Navigable Small World)
Trade accuracy for speed

💡

Production Tip: For large datasets (>100K samples), use approximate nearest neighbor libraries like Faiss (Facebook) or Annoy (Spotify) instead of exact KNN.

Weighted KNN

Instead of equal votes, weight neighbors by distance:

\hat{y} = \arg\max_k \sum_{i \in N_k(x)} w_i \cdot \mathbb{1}[y_i = k]

where $w_i = 1/d(x, x_i)$ or $w_i = e^{-d(x, x_i)}$

Weighted KNN gives more influence to closer neighbors and often outperforms unweighted KNN.

Code Implementation

Explanation of Code

Effect of K: Shows how increasing K smooths the decision boundary and affects bias-variance tradeoff.
Distance Metrics: Compares Euclidean, Manhattan, Chebyshev, and Cosine distances.
Feature Scaling: Demonstrates that KNN is highly sensitive to feature scales.
Curse of Dimensionality: Shows how accuracy degrades as dimensions increase.
KNN Regression: Demonstrates KNN for regression tasks.
Weighted KNN: Compares uniform vs distance-weighted voting.
Algorithm Comparison: Benchmarks different KNN implementations.

Real-World Applications

Amazon: Recommendation Systems

Amazon uses KNN for:

Product Recommendations: Finding similar products based on purchase history
Collaborative Filtering: User-based and item-based KNN
Content-Based Filtering: Similar item features

Stripe: Fraud Detection

Stripe uses KNN variants for:

Anomaly Detection: Identifying unusual transaction patterns
Customer Segmentation: Grouping similar merchants
Risk Scoring: Similarity to known fraud cases

💡

Amazon Interview Tip: Be prepared to discuss the tradeoff between KNN and model-based methods. KNN is interpretable but slow; model-based methods are faster but less interpretable.

Common Follow-Up Questions

Q1: Why is feature scaling essential for KNN?

KNN uses distance metrics, which are sensitive to feature scales. Without scaling, features with larger ranges dominate the distance calculation. For example, income (in thousands) would dominate age (in years).

Q2: How do you handle categorical features in KNN?

Use Hamming distance for binary/categorical features
One-hot encode categorical features (increases dimensionality)
Use Gower distance (handles mixed feature types)

Q3: What is the computational complexity of KNN?

Training: $O(1)$ (lazy learning)
Prediction: $O(np)$ for brute force, $O(p \log n)$ for KD-Tree
This makes KNN slow for large datasets

Q4: How do you choose the optimal K?

Use cross-validation
Start with $K = \sqrt{n}$
Odd K avoids ties in binary classification
Larger K reduces variance but increases bias

Company-Specific Tips

Amazon Interview Tips

Discuss scalability challenges with large datasets
Be ready to explain approximate nearest neighbors
Mention hybrid approaches (KNN + deep learning)
Talk about real-time serving requirements

Stripe Interview Tips

Focus on anomaly detection applications
Discuss interpretability for fraud explanations
Be prepared to explain feature engineering for transactions
Mention online learning for concept drift

K-Nearest Neighbors: Distance Metrics & Curse of Dimensionality

K-Nearest Neighbors: Distance Metrics & Curse of Dimensionality

Interview Question

Theoretical Foundation

KNN Algorithm

Decision Boundary

Choosing K

Distance Metrics

1. Euclidean Distance (L2)

2. Manhattan Distance (L1)

3. Minkowski Distance (Generalized)

4. Cosine Similarity

5. Hamming Distance

Distance Metric Comparison

Curse of Dimensionality

Optimizations for KNN

1. KD-Tree

2. Ball Tree

3. Approximate Nearest Neighbors (ANN)

Weighted KNN

Code Implementation

Explanation of Code

Real-World Applications

Amazon: Recommendation Systems

Stripe: Fraud Detection

Common Follow-Up Questions

Company-Specific Tips

Amazon Interview Tips

Stripe Interview Tips

Related Topics