System Design Problems
Design Facebook
Facebook serves 3B+ monthly active users with diverse features: news feed, groups, pages, marketplace, and messaging. This design focuses on the core social graph and feed generation.
- Scale — 3B MAU, 2B daily active users, 500M+ posts/day
- Social Graph — 500B+ friend connections
- Feed — Personalized ranking with ML models
Facebook is not just a social network—it's a platform ecosystem requiring microservice architecture at planetary scale.
Requirements Clarification
Functional Requirements
- Send friend requests and manage friendships
- Post status updates, photos, videos
- View personalized news feed
- Like, comment, share posts
- Create and join groups
- Create and follow pages
- Send messages (Messenger)
- Notifications
Non-Functional Requirements
- Availability: 99.99% uptime
- Latency: News feed < 500ms
- Consistency: Eventual consistency for feed, strong for relationships
- Scale: 3B users, 500M posts/day, 2B feed reads/day
Facebook's architecture is fundamentally different from Twitter because of the social graph complexity. Facebook has both strong and weak ties, groups with varying sizes, and pages with millions of followers.
Back-of-the-Envelope Estimation
Social Graph Size
Here,
- =Monthly active users
- =Average friends per user
- =Total friend connections
Storage for Social Graph
At 450B edges, storing each edge as 8 bytes (two 32-bit user IDs): 450B × 8 bytes = 3.6 PB for the graph alone
With metadata (friendship date, status): 3.6 PB × 5 = 18 PB
High-Level Architecture
Social Graph: TAO Architecture
DfTAO (The Associations and Objects)
TAO is Facebook's distributed data store for the social graph. It models the graph as objects (users, posts, comments) and associations (friendships, likes, follows). TAO uses MySQL for storage with a write-through cache layer.
TAO Data Model
Here,
- =Set of objects (users, posts, etc.)
- =Set of associations (edges between objects)
Graph Partitioning
DfVertex-Cut Partitioning
TAO uses vertex-cut partitioning: each edge is assigned to a partition based on the hash of the source vertex. This ensures all edges from a single user are on the same partition, enabling efficient fan-out queries.
Partition Assignment
Here,
- =Source vertex (user)
- =Target vertex
- =Number of partitions
News Feed Generation
Feed Ranking Pipeline
Feed Ranking Score
Here,
- =ML-predicted engagement probability
- =Time decay factor
- =Relationship strength
- =Content variety bonus
Group Feed
DfGroup Feed Generation
Group feeds use a different strategy than personal feeds. For small groups (<500 members), fan-out on write is feasible. For large groups (>5000 members), fan-out on read with caching is preferred.
Data Model
User Schema
Here,
- =Unique user identifier
- =Denormalized friend count
Post Schema
Here,
- =Group ID if posted in a group
- =Page ID if posted by a page
- =public/friends/private/group
Scaling Strategies
Write Amplification vs Read Amplification
Facebook's feed uses hybrid fan-out: fan-out on write for most users, pull-on-read for users following celebrities or very active posters. The threshold is dynamic based on system load.
Fan-out Cost
Here,
- =Number of followers
- =Proportional to follower count
Practice Exercises
-
Graph Traversal: Design an algorithm to find "People You May Know" using friend-of-friend traversal. What's the time complexity?
-
Feed Consistency: How do you handle the case where a user unfriends someone, but the unfriended person's posts still appear in the feed? Design a consistency mechanism.
-
Group Scaling: Design a group with 10M members. How do you handle posts, notifications, and moderation?
-
Privacy: How would you implement fine-grained privacy controls (e.g., "friends except coworkers") without impacting feed generation performance?
Key Takeaways:
- Facebook uses TAO for social graph storage with vertex-cut partitioning
- News feed uses ML-based ranking with multi-stage pipeline
- Hybrid fan-out (write for normal, read for celebrities) balances performance
- GraphQL gateway provides flexible query capabilities
- Group feeds require different strategies based on group size
What to Learn Next
-> Design Instagram Photo sharing and media delivery at scale.
-> Design Twitter Real-time feeds and fan-out architectures.
-> Design WhatsApp Messaging systems and real-time delivery.
-> Design YouTube Video streaming and content delivery.
-> CAP Theorem Consistency vs availability trade-offs.
-> Caching Strategies Distributed caching and invalidation.