πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Design Twitter

System Design ProblemsSocial Media Systems🟒 Free Lesson

Advertisement

System Design Problems

Design Twitter

Twitter serves 400M+ monthly active users with 500M+ tweets daily. This design covers tweet distribution, timeline generation, trending topics, and real-time search.

  • Scale β€” 400M MAU, 500M tweets/day, 600K QPS reads
  • Timeline β€” Home timeline vs user timeline with fan-out
  • Trends β€” Real-time trending topics with decay algorithms

Twitter's core challenge is the celebrity problem: a single tweet from a popular account must reach millions instantly.

Requirements Clarification

Functional Requirements

  1. Post tweets (text, images, videos, polls)
  2. View home timeline (tweets from followed users)
  3. View user timeline (tweets from a specific user)
  4. Follow/unfollow users
  5. Like, retweet, reply to tweets
  6. View trending topics
  7. Search tweets by keywords/hashtags
  8. Real-time notifications

Non-Functional Requirements

  1. Availability: 99.99% uptime
  2. Latency: Timeline loads < 200ms
  3. Consistency: Eventual consistency acceptable
  4. Durability: Tweets must never be lost
  5. Scale: 500M tweets/day, 600K timeline reads QPS

Twitter's fundamental architectural decision: Fan-out on write for home timelines. When a user tweets, the tweet ID is immediately pushed to all followers' timeline caches. This trades write amplification for fast reads.

Back-of-the-Envelope Estimation

Tweet Write QPS

QPSwrite=500Γ—10686400β‰ˆ5,800\text{QPS}_{\text{write}} = \frac{500 \times 10^6}{86400} \approx 5,800

Here,

  • 500M500M=Tweets per day
  • 8640086400=Seconds in a day

Fan-out Storage Estimation

Average user: 200 followers Celebrity user: 10M followers

If 1% of users are celebrities (4M users) and each has 10M followers: Fan-out writes per celebrity tweet = 10M Total fan-out writes for all celebrity tweets per day = 4M Γ— 10M Γ— tweets_per_celebrity

This is why Twitter uses a hybrid approach: fan-out on write for normal users, pull on read for celebrities.

High-Level Architecture

ClientsAPI Gateway / Load BalancerTweet ServicePost/StoreTimelineFan-outSocial GraphFollow GraphSearchIndex/TrendingMediaUpload/CDNNotificationPush/EmailMessage Queue (Kafka)Tweet Store(Cassandra)Timeline Cache(Redis Cluster)Graph Store(MySQL/Sharded)Search Index(Elasticsearch)Object Store(S3/GCS)Trending Service (Count-Min Sketch + Decay)

Timeline Generation Deep Dive

Home Timeline Architecture

DfFan-out on Write Timeline

When a user tweets, the tweet ID is pushed to the timeline cache of all followers. The timeline cache stores the 800 most recent tweet IDs per user. When a user opens their home timeline, we simply read from this cache and hydrate with full tweet data.

User ATweetsFan-outServicePush tweet_idUser B CacheUser C CacheUser D CacheTimelineReadHydrate tweets

Celebrity Handling

For users with millions of followers (celebrities), fan-out on write is too expensive. Instead, Twitter uses a hybrid approach: the home timeline service checks if the author is a celebrity and falls back to pulling their recent tweets at read time.

Celebrity Threshold

IfΒ followers>Tcelebrity,Β useΒ pull-on-readΒ insteadΒ ofΒ push-on-write\text{If } followers > T_{\text{celebrity}} \text{, use pull-on-read instead of push-on-write}

Here,

  • TextcelebrityT_{ ext{celebrity}}=Celebrity threshold (typically 10K-100K)
  • followersfollowers=Number of followers

Trending Topics

Count-Min Sketch Algorithm

DfCount-Min Sketch

A probabilistic data structure used to estimate frequency counts of items in a stream. Uses multiple hash functions to reduce collision probability. Space-efficient: O(Ρ⁻¹ Γ— ln(δ⁻¹)) for error Ξ΅ and confidence 1-Ξ΄.

Count-Min Sketch Update

ForΒ eachΒ hashΒ functionΒ hi:count[i][hi(item)]+=1\text{For each hash function } h_i: \text{count}[i][h_i(item)] += 1

Here,

  • hih_i=ith hash function
  • count[i][j]count[i][j]=ith row, jth column of sketch

Trending Score with Time Decay

Trending Score

S(t)=C(t)eΞ»β‹…Ξ”tS(t) = \frac{C(t)}{e^{\lambda \cdot \Delta t}}

Here,

  • C(t)C(t)=Current mention count
  • lambdalambda=Decay constant
  • DeltatDelta t=Time since first mention

Trending Score Calculation

Topic A: 10,000 mentions in last hour Topic B: 50,000 mentions but started 6 hours ago

With Ξ» = 0.1: Score_A = 10000 / e^(0.1 Γ— 1) = 9048 Score_B = 50000 / e^(0.1 Γ— 6) = 50000 / 1.822 = 27,442

Topic B still trends because its total volume outweighs the decay.

Search Architecture

DfInverted Index

Twitter's search uses an inverted index mapping terms to tweet IDs. The index is partitioned by time (recent tweets) and by popularity (all-time tweets). Queries are evaluated using Boolean operations on posting lists.

Data Model

Tweet Schema

Tweet=(tweet_id,user_id,text,media_ids[],reply_to,retweet_of,timestamp,lang)\text{Tweet} = (tweet\_id, user\_id, text, media\_ids[], reply\_to, retweet\_of, timestamp, lang)

Here,

  • tweetidtweet_id=Unique tweet ID (Snowflake)
  • replytoreply_to=Parent tweet ID (null if top-level)
  • retweetofretweet_of=Original tweet ID (null if original)

Twitter uses Snowflake IDs: 64-bit IDs encoding timestamp, machine ID, and sequence number. This guarantees uniqueness without coordination and provides rough chronological ordering.

Practice Exercises

  1. Fan-out Design: How would you handle a celebrity with 100M followers posting a tweet? Estimate the time and resources needed for fan-out.

  2. Timeline Consistency: If a user follows someone who tweets, but the tweet doesn't appear in their timeline for 5 seconds, is this acceptable? Design a system to minimize this delay.

  3. Trending Accuracy: How would you detect and filter out artificially inflated trending topics (bot manipulation)?

  4. Search Performance: Design a search system that can return results for "machine learning" in under 100ms across 500M tweets.

Key Takeaways:

  • Twitter uses fan-out on write with celebrity exception (pull-on-read)
  • Snowflake IDs provide globally unique, roughly ordered identifiers
  • Trending uses Count-Min Sketch with time decay
  • Search relies on inverted indices with time-based partitioning
  • The celebrity problem is the central scaling challenge

What to Learn Next

-> Design Instagram Photo sharing and feed generation at scale.

-> Design Facebook Social graph and news feed architecture.

-> Design WhatsApp Messaging systems and real-time delivery.

-> Design Leaderboard Sorted sets and real-time ranking systems.

-> Back Pressure Managing load in message-driven architectures.

-> Caching Strategies Cache invalidation and distributed caching.

⭐

Premium Content

Design Twitter

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement