πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Design an Email System

System Design ProblemsEmail Infrastructure🟒 Free Lesson

Advertisement

System Design Problems

Design an Email System

An email system handles sending, receiving, storing, and searching email across billions of users. Gmail alone stores 1.8 billion users' email with 15 GB free storage per account, requiring distributed storage, spam filtering, and strong delivery guarantees.

  • Send/Receive β€” SMTP-based delivery with retry and bounce handling
  • Storage β€” Petabytes of email data with per-user organization
  • Search β€” Full-text search across billions of messages

Email is a distributed system by design: different mail servers communicate via SMTP, store messages locally, and handle delivery failures gracefully.

Requirements

Functional Requirements

  • Send and receive email (SMTP/IMAP/POP3)
  • Inbox, sent, drafts, spam, trash folders
  • Full-text search across email content
  • Attachment support (up to 25 MB)
  • Spam and phishing detection
  • Email forwarding and auto-reply rules
  • Read receipts and delivery status

Non-Functional Requirements

  • Delivery: At-least-once delivery with exactly-once semantics
  • Latency: Email delivered within 30 seconds
  • Storage: 15 GB per user, 1.8 billion users
  • Search: Full-text search in < 1 second
  • Spam: Filter > 99% of spam emails
  • Availability: 99.99%

Email uses the store-and-forward model: messages are stored on intermediate servers before reaching the recipient. This provides resilience but introduces delivery latency.

Back-of-the-Envelope Estimation

Email System Capacity

  • 300 billion emails/day worldwide
  • Average email size: 75 KB (with headers)
  • Daily storage: 300B Γ— 75 KB = 22.5 PB/day
  • Spam: 85% of email = 255B spam/day
  • Legitimate email: 45B/day
  • QPS_send = 300B / 86400 β‰ˆ 3.5M emails/sec

API Design

Architecture Diagram
POST /api/v1/messages/send
Request: {
  "to": ["user@example.com"],
  "subject": "Hello",
  "body": "Plain text or HTML",
  "attachments": ["file_id_123"]
}
Response: { "message_id": "msg_abc123", "status": "queued" }

GET /api/v1/messages?folder=inbox&limit=50
Response: { "messages": [...], "total": 1234 }

GET /api/v1/messages/msg_abc123
Response: { "message_id": "...", "from": "...", "body": "..." }

GET /api/v1/search?q=invoice+2026
Response: { "messages": [...], "count": 42 }

High-Level Architecture

SenderSMTP ServerMTA (Send)MX LookupRetry QueueKafkaProcessingSpam FilterVirus ScanDLP RulesIndexingRules EngineSpam DB(Bloom filter)Email Store(Bigtable/S3)Email System Architecture

Detailed Design

Email Delivery Flow

DfStore-and-Forward Model

Email uses store-and-forward: the sender's MTA stores the message, attempts delivery to the recipient's MX server, retries on failure, and bounces after max retries.

SenderSender MTAMX Lookup(DNS)Recipient MTA(Store)InboxSMTP delivery flow with store-and-forward

SMTP Protocol

DfSMTP (Simple Mail Transfer Protocol)

SMTP is the standard protocol for sending email between mail servers (port 587/465). It uses a command-response model: HELO β†’ MAIL FROM β†’ RCPT TO β†’ DATA β†’ message content β†’ QUIT.

Architecture Diagram
S: 220 mail.example.com ESMTP
C: HELO sender.com
S: 250 Hello
C: MAIL FROM:<sender@sender.com>
S: 250 OK
C: RCPT TO:<recipient@example.com>
S: 250 OK
C: DATA
S: 354 Start mail input
C: Subject: Hello
C: From: sender@sender.com
C: To: recipient@example.com
C: 
C: This is the email body.
C: .
S: 250 OK: Message queued
C: QUIT

Spam Filtering

Spam Score

spam_score=w1β‹…bayesian+w2β‹…rules+w3β‹…reputation+w4β‹…ml_modelspam\_score = w_1 \cdot bayesian + w_2 \cdot rules + w_3 \cdot reputation + w_4 \cdot ml\_model

Here,

  • bayesianbayesian=Bayesian classifier score (keyword probabilities)
  • rulesrules=SpamAssassin-style rules (header checks, content patterns)
  • reputationreputation=Sender IP/domain reputation score
  • mlmodelml_model=Deep learning classification score

Use a multi-layer spam filter: fast rules first (block known spam), then ML classification for uncertain cases. This balances accuracy with processing latency.

Email Storage

Store emails in a distributed storage system:

ComponentTechnologyReason
Email bodiesObject Storage (S3)Large blobs, cost-effective
MetadataBigtable/CassandraFast lookups, time-series
Full-text indexElasticsearchSearch across content
Spam databaseRedis/Bloom filterFast membership checks

Delivery Guarantees

DfAt-Least-Once Delivery

At-least-once delivery ensures every email is delivered, even if retries cause duplicates. The recipient's MTA uses the Message-ID header to detect and deduplicate messages.

Retry Strategy

delay=min(baseΓ—2attempt,max_delay)delay = min(base \times 2^{attempt}, max\_delay)

Here,

  • basebase=Base retry delay (e.g., 5 minutes)
  • attemptattempt=Current retry attempt
  • maxdelaymax_delay=Maximum delay cap (e.g., 4 hours)

Email servers typically retry delivery for up to 48 hours before bouncing. The retry schedule is: 5min, 15min, 45min, 2h, 4h, 8h, 16h, 24h.

Practice Exercises

  1. Design: How would you implement email forwarding rules (e.g., "forward emails from boss to phone")? Design the rules engine.

  2. Scale: If Gmail stores 15 GB per user for 1.8 billion users, estimate the total storage needed and the distributed storage architecture.

  3. Reliability: Design a system to guarantee exactly-once email delivery. What challenges arise from SMTP retries and network partitions?

  4. Search: How would you implement full-text search across 100 billion emails? What indexing strategy would you use?

Key Takeaways:

  • Email uses store-and-forward (SMTP) with retry and bounce handling for reliable delivery
  • Multi-layer spam filtering: rules β†’ Bayesian β†’ ML classification for accuracy and speed
  • At-least-once delivery with Message-ID deduplication prevents duplicate emails
  • Distributed storage: S3 for bodies, Bigtable for metadata, Elasticsearch for search
  • Retry with exponential backoff up to 48 hours before bouncing

What to Learn Next

-> Design Notification System Multi-channel notification delivery patterns.

-> Message Queues Kafka for async email processing and delivery.

-> Design Object Storage Storing email attachments at scale.

-> Databases Bigtable/Cassandra for email metadata storage.

-> Rate Limiting Preventing email abuse and spam.

-> Security Patterns Email encryption (PGP/S/MIME) and authentication (DKIM/SPF).

⭐

Premium Content

Design an Email System

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement