🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Design WhatsApp

System Design ProblemsMessaging Systems🟢 Free Lesson

Advertisement

System Design Problems

Design WhatsApp

WhatsApp serves 2B+ users with 100B+ messages daily. This design explores building a globally distributed messaging platform with end-to-end encryption, media delivery, and real-time presence.

  • Scale — 2B users, 100B messages/day, 50M messages/second peak
  • Latency — Message delivery under 100ms for 95% of messages
  • Encryption — End-to-end encryption using Signal Protocol

Designing for WhatsApp means solving the hardest problems in distributed messaging at planetary scale.

Requirements Clarification

Functional Requirements

  1. One-to-one text messaging with delivery/read receipts
  2. Group messaging (up to 1024 members)
  3. Media sharing (images, videos, documents up to 2GB)
  4. Online/offline presence indicators
  5. Message history and synchronization across devices
  6. End-to-end encryption (E2EE)

Non-Functional Requirements

  1. Availability: 99.99% uptime
  2. Latency: < 100ms for message delivery (P99)
  3. Durability: Messages must not be lost once sent
  4. Consistency: Causal ordering within conversations
  5. Scale: 2B registered users, 500M daily active users

The critical insight: WhatsApp is a store-and-forward system, not a direct connection system. Messages are queued on servers and delivered when recipients come online.

Back-of-the-Envelope Estimation

Message Throughput

QPSavg=100×109 messages86400 seconds1.16M QPS\text{QPS}_{\text{avg}} = \frac{100 \times 10^9 \text{ messages}}{86400 \text{ seconds}} \approx 1.16M \text{ QPS}

Here,

  • 100B100B=Messages per day
  • 8640086400=Seconds in a day
  • 1.16M1.16M=Average QPS

Storage Estimation

Average message size: 100 bytes (text), 100KB (media average)

Text storage per day: 100B × 100 bytes = 10 TB/day

Media storage per day (assuming 5% messages have media): 5B × 100KB = 500 TB/day

Total storage per year: (10 TB + 500 TB) × 365 ≈ 186 PB/year

Bandwidth Estimation

Bandwidth=QPS×avg message size\text{Bandwidth} = \text{QPS} \times \text{avg message size}

Here,

  • QPSQPS=Queries per second
  • avgmessagesizeavg message size=Average message size in bytes

High-Level Architecture

MobileWebDesktopLoad Balancer (L7)ConnectionManagerMessageRouterPresenceServiceMediaServiceGroupServiceMessage Queue (Kafka)Message Store(Cassandra)User DB(MySQL)Media Store(S3 + CDN)Presence Store(Redis Cluster)

Core Components Deep Dive

1. Connection Manager

Maintains persistent WebSocket connections with clients:

DfConnection Mapping

Each user maintains exactly one active connection per device. The connection manager uses a mapping: user_id + device_id → server_id. When a message arrives, the router looks up this mapping to find which server holds the connection.

class ConnectionManager:
    def __init__(self):
        self.user_connections = {}  # user_id -> {device_id: server_id}
        self.server_connections = {}  # server_id -> {connection_id: user_id}
    
    def register(self, user_id, device_id, server_id):
        self.user_connections[user_id][device_id] = server_id
        self.server_connections[server_id][connection_id] = user_id
    
    def get_servers(self, user_id):
        return set(self.user_connections.get(user_id, {}).values())

2. Message Router

Routes messages between senders and receivers:

Message Routing Complexity

O(1) lookup for 1:1, O(n) for group of size nO(1) \text{ lookup for 1:1, } O(n) \text{ for group of size } n

Here,

  • O(1)O(1)=Direct message lookup time
  • O(n)O(n)=Group fan-out time

3. Message Flow (1:1 Chat)

Sender(Online)1. Send Message2. Store in Queue3. Persist to DB4. Deliver to ReceiverReceiver(Online/Offline)5. Delivery Receipt6. Read Receipt

4. End-to-End Encryption

DfSignal Protocol

WhatsApp uses the Signal Protocol for E2EE. Each message is encrypted with a unique key derived from a Double Ratchet algorithm. Keys are exchanged using X3DH (Extended Triple Diffie-Hellman) key agreement.

Key Derivation

Kn=HKDF(Kratchet,ChainKeyn)K_n = \text{HKDF}(K_{ratchet}, \text{ChainKey}_n)

Here,

  • KnK_n=Message key for message n
  • KratchetK_{ratchet}=Root ratchet key
  • ChainKeynChainKey_n=Symmetric ratchet chain key

E2EE means the server never sees plaintext messages. This limits server-side features like message search and spam detection. WhatsApp uses sender keys for group encryption to reduce overhead.

5. Group Messaging

DfSender Key Distribution

For groups, the sender distributes a unique sender key to all members. The sender encrypts the message once with this key, then distributes it. Each member uses the sender key to decrypt. When a member joins/leaves, a new sender key is distributed.

Group Encryption Cost

Cost=O(n) for key distribution, O(1) per message\text{Cost} = O(n) \text{ for key distribution, } O(1) \text{ per message}

Here,

  • nn=Group size
  • O(n)O(n)=Key distribution cost
  • O(1)O(1)=Per-message encryption cost

6. Presence System

DfPresence Protocol

Presence is maintained using a heartbeat mechanism. Clients send presence updates every 30 seconds. The presence service stores these in Redis with a TTL of 60 seconds. If no heartbeat is received, the user is marked offline.

Data Model

Message Table Schema

Messages=(msg_id,chat_id,sender_id,content,timestamp,status)\text{Messages} = (msg\_id, chat\_id, sender\_id, content, timestamp, status)

Here,

  • msgidmsg_id=Unique message ID (ULID)
  • chatidchat_id=Conversation identifier
  • senderidsender_id=Sender user ID
  • timestamptimestamp=Unix timestamp (ms)
  • statusstatus=sent/delivered/read

Scaling Strategies

Message Storage Partitioning

Messages are partitioned by chat_id using consistent hashing:

Partition Assignment

partition=hash(chat_id)modNpartitions\text{partition} = \text{hash}(chat\_id) \mod N_{\text{partitions}}

Here,

  • chatidchat_id=Conversation identifier
  • NextpartitionsN_{ ext{partitions}}=Total number of partitions

Push vs Pull for Delivery

WhatsApp uses a hybrid approach: Long polling for clients with unstable connections, and push notifications for offline clients. The server maintains a delivery queue per user.

Practice Exercises

  1. Design: How would you implement message synchronization across multiple devices for the same user? Consider ordering and conflict resolution.

  2. Scale: WhatsApp has 50M concurrent users. Estimate the number of WebSocket connections needed and the memory overhead per connection.

  3. Reliability: Design a mechanism to ensure messages are never lost, even if the sender's device crashes after sending but before receiving acknowledgment.

  4. Optimization: How would you reduce bandwidth usage for users on slow networks? Propose a compression strategy for text and media.

Key Takeaways:

  • WhatsApp uses a store-and-forward model with persistent connections
  • End-to-end encryption via Signal Protocol limits server-side processing
  • Group messaging uses sender keys to minimize encryption overhead
  • Partition by chat_id for message storage scalability
  • Hybrid push/pull for delivery optimization

What to Learn Next

-> Design Instagram Photo sharing, feeds, and media delivery at scale.

-> Design Twitter Real-time feeds, fan-out, and timeline generation.

-> Design YouTube Video streaming, transcoding, and CDN delivery.

-> Design Netflix Content delivery, recommendation, and adaptive streaming.

-> Circuit Breaker Pattern Preventing cascade failures in distributed systems.

-> Back Pressure Managing load in message-driven architectures.

Premium Content

Design WhatsApp

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement