System Design Problems
Design a Recommendation System
A recommendation system predicts user preferences and suggests relevant items. Netflix saves $1B/year through its recommendation engine, which drives 80% of content watched. The system combines collaborative filtering, content-based filtering, and deep learning.
- Collaborative Filtering β "Users like you also liked..."
- Content-Based Filtering β "Items similar to ones you've liked..."
- Hybrid Approach β Combine signals for better accuracy
The challenge is balancing relevance (what users want) with diversity (what users don't know they want) at massive scale with sub-second latency.
Requirements
Functional Requirements
- "Because you watched X" recommendations (item-to-item)
- "Users like you also liked" (user-to-user)
- Trending and popular items
- Personalized homepage feed
- Real-time updates based on user actions
- A/B testing framework for algorithms
Non-Functional Requirements
- Latency: Recommendations in < 200ms
- Scale: 200M users, 100M items
- Freshness: User actions reflected within minutes
- Accuracy: Click-through rate > 5%
- Diversity: Avoid filter bubbles; introduce serendipity
Recommendation systems face the cold-start problem: new users have no history, and new items have no interactions. The system must handle both gracefully using content-based features.
Back-of-the-Envelope Estimation
Recommendation System Capacity
- 200M users Γ 100 interactions/day = 20B interactions/day
- Interaction storage: 20B Γ 100 bytes = 2 TB/day
- Item catalog: 100M items Γ 1 KB features = 100 GB
- User profiles: 200M Γ 10 KB features = 2 TB
- Recommendation cache: 200M Γ 50 recommendations Γ 100 bytes = 1 TB
Recommendation Algorithms
Collaborative Filtering
DfCollaborative Filtering
Collaborative filtering recommends items based on user behavior patterns. It assumes users who agreed in the past will agree in the future. The user-item interaction matrix is factorized into latent user and item embeddings.
Matrix Factorization
Here,
- =Predicted rating for user u on item i
- =User latent factor vector
- =Item latent factor vector
- =User bias
- =Item bias
Matrix Factorization Intuition
User vector: [0.8, -0.2, 0.5, 0.1] (likes action, dislikes romance, likes sci-fi, neutral comedy) Item vector: [0.7, 0.1, 0.6, -0.3] (action-heavy, slight romance, sci-fi, anti-comedy)
Predicted rating = dot product = 0.56 - 0.02 + 0.30 - 0.03 = 0.81 (high match)
Content-Based Filtering
DfContent-Based Filtering
Content-based filtering recommends items similar to those a user has previously liked, based on item features (genre, director, keywords). It uses cosine similarity between item feature vectors.
Cosine Similarity
Here,
- =Feature vector of item A
- =Feature vector of item B
Content-Based Similarity
Movie A features: [action:0.9, sci-fi:0.8, comedy:0.1] Movie B features: [action:0.7, sci-fi:0.9, comedy:0.0]
similarity = (0.9Γ0.7 + 0.8Γ0.9 + 0.1Γ0.0) / (β(0.81+0.64+0.01) Γ β(0.49+0.81+0)) = (0.63 + 0.72) / (1.21 Γ 1.14) = 1.35 / 1.38 = 0.978 (very similar)
Two-Tower Deep Learning Model
DfTwo-Tower Model
A two-tower model encodes users and items into separate embedding spaces using neural networks. At serving time, pre-computed item embeddings are retrieved via approximate nearest neighbor (ANN) search, enabling sub-second recommendations.
High-Level Architecture
Three-Phase Pipeline
DfRecommendation Pipeline
The recommendation pipeline has three phases: candidate generation (retrieve thousands of candidates), ranking (score candidates with ML model), and re-ranking (apply business rules, diversity, and freshness).
| Phase | Candidates | Latency | Algorithm |
|---|---|---|---|
| Candidate Generation | 1000+ | 10ms | ANN search, collaborative filtering |
| Ranking | 100 | 50ms | Deep learning scoring |
| Re-ranking | 10 | 10ms | Business rules, diversity |
Use approximate nearest neighbor (ANN) libraries like FAISS or ScaNN for sub-millisecond candidate retrieval from millions of item embeddings.
Handling Cold Start
For new users and items:
Cold Start Strategy
Here,
- =Weight for collaborative signal (0 if no history)
- eta=Weight for content-based signal (higher for cold start)
- =Weight for popularity (fallback for new items)
Cold Start Weighting
New user (no history): Ξ±=0, Ξ²=0.7, Ξ³=0.3 β content-based + popular Established user: Ξ±=0.5, Ξ²=0.3, Ξ³=0.2 β collaborative + content + popular
Practice Exercises
-
Algorithm: Implement cosine similarity between two item feature vectors. What is the time complexity for finding the top-K most similar items?
-
Scale: If you have 100M items with 128-dimensional embeddings, estimate the memory and query time for an FAISS index.
-
Diversity: Design a re-ranking algorithm that ensures the top 10 recommendations don't all belong to the same genre. How do you balance relevance vs. diversity?
-
Evaluation: Design an A/B testing framework for recommendation algorithms. What metrics would you track and how long should experiments run?
Key Takeaways:
- Collaborative filtering uses user behavior patterns; content-based uses item features
- Two-tower deep learning models pre-compute embeddings for sub-second ANN retrieval
- Three-phase pipeline: candidate generation β ranking β re-ranking
- Cold start is handled by shifting weights toward content-based and popularity signals
- FAISS/ScaNN enable sub-millisecond nearest neighbor search over millions of embeddings
What to Learn Next
-> Design Search Engine Inverted indices and relevance ranking.
-> Design News Feed Personalized content ranking and fan-out.
-> Design Realtime Analytics Real-time user interaction tracking.
-> Design Search Autocomplete Trie-based typeahead with popularity ranking.
-> Databases Storing user profiles and item features.
-> Caching Strategies Caching pre-computed recommendations.