System Design Problems

Design a Recommendation System

A recommendation system predicts user preferences and suggests relevant items. Netflix saves $1B/year through its recommendation engine, which drives 80% of content watched. The system combines collaborative filtering, content-based filtering, and deep learning.

Collaborative Filtering — "Users like you also liked..."
Content-Based Filtering — "Items similar to ones you've liked..."
Hybrid Approach — Combine signals for better accuracy

The challenge is balancing relevance (what users want) with diversity (what users don't know they want) at massive scale with sub-second latency.

Requirements

Functional Requirements

"Because you watched X" recommendations (item-to-item)
"Users like you also liked" (user-to-user)
Trending and popular items
Personalized homepage feed
Real-time updates based on user actions
A/B testing framework for algorithms

Non-Functional Requirements

Latency: Recommendations in < 200ms
Scale: 200M users, 100M items
Freshness: User actions reflected within minutes
Accuracy: Click-through rate > 5%
Diversity: Avoid filter bubbles; introduce serendipity

Recommendation systems face the cold-start problem: new users have no history, and new items have no interactions. The system must handle both gracefully using content-based features.

Back-of-the-Envelope Estimation

Recommendation System Capacity

200M users × 100 interactions/day = 20B interactions/day
Interaction storage: 20B × 100 bytes = 2 TB/day
Item catalog: 100M items × 1 KB features = 100 GB
User profiles: 200M × 10 KB features = 2 TB
Recommendation cache: 200M × 50 recommendations × 100 bytes = 1 TB

Recommendation Algorithms

Collaborative Filtering

DfCollaborative Filtering

Collaborative filtering recommends items based on user behavior patterns. It assumes users who agreed in the past will agree in the future. The user-item interaction matrix is factorized into latent user and item embeddings.

Matrix Factorization

\hat{r}_{ui} = p_u^T \times q_i + b_u + b_i

Here,

$hat{r}_{ui}$ =Predicted rating for user u on item i
$p_u$ =User latent factor vector
$q_i$ =Item latent factor vector
$b_u$ =User bias
$b_i$ =Item bias

Matrix Factorization Intuition

User vector: [0.8, -0.2, 0.5, 0.1] (likes action, dislikes romance, likes sci-fi, neutral comedy) Item vector: [0.7, 0.1, 0.6, -0.3] (action-heavy, slight romance, sci-fi, anti-comedy)

Predicted rating = dot product = 0.56 - 0.02 + 0.30 - 0.03 = 0.81 (high match)

Content-Based Filtering

DfContent-Based Filtering

Content-based filtering recommends items similar to those a user has previously liked, based on item features (genre, director, keywords). It uses cosine similarity between item feature vectors.

Cosine Similarity

similarity = \frac{A \cdot B}{||A|| \times ||B||}

Here,

$A$ =Feature vector of item A
$B$ =Feature vector of item B

Content-Based Similarity

Movie A features: [action:0.9, sci-fi:0.8, comedy:0.1] Movie B features: [action:0.7, sci-fi:0.9, comedy:0.0]

similarity = (0.9×0.7 + 0.8×0.9 + 0.1×0.0) / (√(0.81+0.64+0.01) × √(0.49+0.81+0)) = (0.63 + 0.72) / (1.21 × 1.14) = 1.35 / 1.38 = 0.978 (very similar)

Two-Tower Deep Learning Model

DfTwo-Tower Model

A two-tower model encodes users and items into separate embedding spaces using neural networks. At serving time, pre-computed item embeddings are retrieved via approximate nearest neighbor (ANN) search, enabling sub-second recommendations.

High-Level Architecture

Three-Phase Pipeline

DfRecommendation Pipeline

The recommendation pipeline has three phases: candidate generation (retrieve thousands of candidates), ranking (score candidates with ML model), and re-ranking (apply business rules, diversity, and freshness).

Phase	Candidates	Latency	Algorithm
Candidate Generation	1000+	10ms	ANN search, collaborative filtering
Ranking	100	50ms	Deep learning scoring
Re-ranking	10	10ms	Business rules, diversity

Use approximate nearest neighbor (ANN) libraries like FAISS or ScaNN for sub-millisecond candidate retrieval from millions of item embeddings.

Handling Cold Start

For new users and items:

Cold Start Strategy

score = \alpha \cdot collaborative + \beta \cdot content\_based + \gamma \cdot popularity

Here,

$alpha$ =Weight for collaborative signal (0 if no history)
$eta$ =Weight for content-based signal (higher for cold start)
$gamma$ =Weight for popularity (fallback for new items)

Cold Start Weighting

New user (no history): α=0, β=0.7, γ=0.3 → content-based + popular Established user: α=0.5, β=0.3, γ=0.2 → collaborative + content + popular

Practice Exercises

Algorithm: Implement cosine similarity between two item feature vectors. What is the time complexity for finding the top-K most similar items?
Scale: If you have 100M items with 128-dimensional embeddings, estimate the memory and query time for an FAISS index.
Diversity: Design a re-ranking algorithm that ensures the top 10 recommendations don't all belong to the same genre. How do you balance relevance vs. diversity?
Evaluation: Design an A/B testing framework for recommendation algorithms. What metrics would you track and how long should experiments run?

Key Takeaways:

Collaborative filtering uses user behavior patterns; content-based uses item features
Two-tower deep learning models pre-compute embeddings for sub-second ANN retrieval
Three-phase pipeline: candidate generation → ranking → re-ranking
Cold start is handled by shifting weights toward content-based and popularity signals
FAISS/ScaNN enable sub-millisecond nearest neighbor search over millions of embeddings

What to Learn Next

-> Design Search Engine Inverted indices and relevance ranking.

-> Design News Feed Personalized content ranking and fan-out.

-> Design Realtime Analytics Real-time user interaction tracking.

-> Design Search Autocomplete Trie-based typeahead with popularity ranking.

-> Databases Storing user profiles and item features.

-> Caching Strategies Caching pre-computed recommendations.

Design a Recommendation System

Design a Recommendation System

Requirements

Functional Requirements

Non-Functional Requirements

Back-of-the-Envelope Estimation

Recommendation System Capacity

Recommendation Algorithms

Collaborative Filtering

DfCollaborative Filtering

Matrix Factorization

Matrix Factorization Intuition

Content-Based Filtering

DfContent-Based Filtering

Cosine Similarity

Content-Based Similarity

Two-Tower Deep Learning Model

DfTwo-Tower Model

High-Level Architecture

Three-Phase Pipeline

DfRecommendation Pipeline

Handling Cold Start

Cold Start Strategy

Cold Start Weighting

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?