🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Design Facebook

System Design ProblemsSocial Media Systems🟢 Free Lesson

Advertisement

System Design Problems

Design Facebook

Facebook serves 3B+ monthly active users with diverse features: news feed, groups, pages, marketplace, and messaging. This design focuses on the core social graph and feed generation.

  • Scale — 3B MAU, 2B daily active users, 500M+ posts/day
  • Social Graph — 500B+ friend connections
  • Feed — Personalized ranking with ML models

Facebook is not just a social network—it's a platform ecosystem requiring microservice architecture at planetary scale.

Requirements Clarification

Functional Requirements

  1. Send friend requests and manage friendships
  2. Post status updates, photos, videos
  3. View personalized news feed
  4. Like, comment, share posts
  5. Create and join groups
  6. Create and follow pages
  7. Send messages (Messenger)
  8. Notifications

Non-Functional Requirements

  1. Availability: 99.99% uptime
  2. Latency: News feed < 500ms
  3. Consistency: Eventual consistency for feed, strong for relationships
  4. Scale: 3B users, 500M posts/day, 2B feed reads/day

Facebook's architecture is fundamentally different from Twitter because of the social graph complexity. Facebook has both strong and weak ties, groups with varying sizes, and pages with millions of followers.

Back-of-the-Envelope Estimation

Social Graph Size

Edges3B×300 avg friends÷2450B edges\text{Edges} \approx 3B \times 300 \text{ avg friends} \div 2 \approx 450B \text{ edges}

Here,

  • 3B3B=Monthly active users
  • 300300=Average friends per user
  • 450B450B=Total friend connections

Storage for Social Graph

At 450B edges, storing each edge as 8 bytes (two 32-bit user IDs): 450B × 8 bytes = 3.6 PB for the graph alone

With metadata (friendship date, status): 3.6 PB × 5 = 18 PB

High-Level Architecture

ClientsGraphQL GatewayProfileServiceFriendServicePostServiceFeedServiceGroupServicePageServiceMessageServiceMessage Queue (Kafka)Graph Store(TAO/MySQL)Post DB(MySQL Sharded)Feed Cache(Redis/Tair)Object Store(Haystack/S3)Search Index(Elasticsearch)TAO: The Associations and Objects Graph StoreDistributed graph database built on MySQL with caching layer

Social Graph: TAO Architecture

DfTAO (The Associations and Objects)

TAO is Facebook's distributed data store for the social graph. It models the graph as objects (users, posts, comments) and associations (friendships, likes, follows). TAO uses MySQL for storage with a write-through cache layer.

TAO Data Model

Graph=(O,A) where O=objects,A=associations\text{Graph} = (O, A) \text{ where } O = \text{objects}, A = \text{associations}

Here,

  • OO=Set of objects (users, posts, etc.)
  • AA=Set of associations (edges between objects)

Graph Partitioning

DfVertex-Cut Partitioning

TAO uses vertex-cut partitioning: each edge is assigned to a partition based on the hash of the source vertex. This ensures all edges from a single user are on the same partition, enabling efficient fan-out queries.

Partition Assignment

partition(u,v)=hash(u)modN\text{partition}(u, v) = \text{hash}(u) \mod N

Here,

  • uu=Source vertex (user)
  • vv=Target vertex
  • NN=Number of partitions

News Feed Generation

Feed Ranking Pipeline

CandidateGenerationFeatureExtractionML ModelScoringBlending& FilteringFeedOutput

Feed Ranking Score

S=αP(engage)+βrecency+γaffinity+δdiversityS = \alpha \cdot P(\text{engage}) + \beta \cdot \text{recency} + \gamma \cdot \text{affinity} + \delta \cdot \text{diversity}

Here,

  • P(extengage)P( ext{engage})=ML-predicted engagement probability
  • recencyrecency=Time decay factor
  • affinityaffinity=Relationship strength
  • diversitydiversity=Content variety bonus

Group Feed

DfGroup Feed Generation

Group feeds use a different strategy than personal feeds. For small groups (<500 members), fan-out on write is feasible. For large groups (>5000 members), fan-out on read with caching is preferred.

Data Model

User Schema

User=(user_id,name,email,profile_photo,friends_count,created_at)\text{User} = (user\_id, name, email, profile\_photo, friends\_count, created\_at)

Here,

  • useriduser_id=Unique user identifier
  • friendscountfriends_count=Denormalized friend count

Post Schema

Post=(post_id,user_id,group_id,page_id,content,media[],visibility,timestamp)\text{Post} = (post\_id, user\_id, group\_id, page\_id, content, media[], visibility, timestamp)

Here,

  • groupidgroup_id=Group ID if posted in a group
  • pageidpage_id=Page ID if posted by a page
  • visibilityvisibility=public/friends/private/group

Scaling Strategies

Write Amplification vs Read Amplification

Facebook's feed uses hybrid fan-out: fan-out on write for most users, pull-on-read for users following celebrities or very active posters. The threshold is dynamic based on system load.

Fan-out Cost

Write cost=O(f) per post,Read cost=O(1)\text{Write cost} = O(f) \text{ per post}, \text{Read cost} = O(1)

Here,

  • ff=Number of followers
  • O(f)O(f)=Proportional to follower count

Practice Exercises

  1. Graph Traversal: Design an algorithm to find "People You May Know" using friend-of-friend traversal. What's the time complexity?

  2. Feed Consistency: How do you handle the case where a user unfriends someone, but the unfriended person's posts still appear in the feed? Design a consistency mechanism.

  3. Group Scaling: Design a group with 10M members. How do you handle posts, notifications, and moderation?

  4. Privacy: How would you implement fine-grained privacy controls (e.g., "friends except coworkers") without impacting feed generation performance?

Key Takeaways:

  • Facebook uses TAO for social graph storage with vertex-cut partitioning
  • News feed uses ML-based ranking with multi-stage pipeline
  • Hybrid fan-out (write for normal, read for celebrities) balances performance
  • GraphQL gateway provides flexible query capabilities
  • Group feeds require different strategies based on group size

What to Learn Next

-> Design Instagram Photo sharing and media delivery at scale.

-> Design Twitter Real-time feeds and fan-out architectures.

-> Design WhatsApp Messaging systems and real-time delivery.

-> Design YouTube Video streaming and content delivery.

-> CAP Theorem Consistency vs availability trade-offs.

-> Caching Strategies Distributed caching and invalidation.

Premium Content

Design Facebook

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement