System Design Problems
Design a Notification System
A notification system delivers messages to users across multiple channels: push notifications (iOS/Android), SMS, and email. Systems like Firebase Cloud Messaging (FCM), Apple Push Notification Service (APNs), and Twilio handle billions of notifications daily.
- Multi-Channel — Push, SMS, email, and in-app notifications
- High Throughput — Process millions of notifications per minute
- Reliability — Guaranteed delivery with retry logic and deduplication
The challenge is not sending a notification—it's sending the right notification, to the right user, through the right channel, at the right time, without duplicates.
Requirements
Functional Requirements
- Send push notifications (iOS/Android), SMS, and email
- Support different notification types (transactional, marketing, alerts)
- Users can opt-in/opt-out of notification channels
- Rate limiting per user to prevent notification fatigue
- Template-based notifications with personalization
- Support scheduled notifications
Non-Functional Requirements
- Latency: Push notifications delivered within 1 second
- Throughput: 10M notifications per minute peak
- Delivery: At-least-once delivery with deduplication
- Availability: 99.9% uptime
- Scalability: Support 500M registered devices
Notification systems are write-heavy with significant fan-out. A single event (e.g., "new follower") may trigger notifications to thousands of users across multiple channels.
Back-of-the-Envelope Estimation
Notification Volume Estimation
- 500M registered devices
- Average 5 notifications/user/day = 2.5B notifications/day
- Peak QPS (3x average): 2.5B / 86400 × 3 ≈ 87K QPS
- Push notifications: 80% of total = 70K QPS
- SMS: 5% = 4.3K QPS
- Email: 15% = 13K QPS
Storage:
- Notification log: 2.5B × 500 bytes × 90 days = ~112 TB
API Design
POST /api/v1/notifications/send
Request: {
"user_id": "user_123",
"type": "push",
"template": "new_follower",
"data": { "follower_name": "Alice" },
"channels": ["push", "email"]
}
Response: { "notification_id": "n_456", "status": "queued" }
POST /api/v1/notifications/bulk
Request: {
"user_ids": ["u1", "u2", ...],
"type": "marketing",
"template": "weekly_digest",
"scheduled_at": "2026-06-21T09:00:00Z"
}
Response: { "batch_id": "b_789", "total": 100000 }
High-Level Architecture
Detailed Design
Notification Processing Pipeline
DfNotification Pipeline
The notification pipeline is a multi-stage processing system: event ingestion → validation → rate limiting → template rendering → channel routing → delivery → tracking.
| Stage | Responsibility |
|---|---|
| Ingestion | Accept notification requests via API |
| Validation | Check user exists, channel preferences |
| Rate Limiting | Enforce per-user and per-type limits |
| Template Rendering | Fill templates with user data |
| Deduplication | Prevent duplicate notifications |
| Channel Routing | Route to correct delivery worker |
| Delivery | Call external provider APIs |
| Tracking | Record delivery status, opens, clicks |
Rate Limiting
Prevent notification fatigue with per-user rate limiting:
Sliding Window Rate Limit
Here,
- =Number of notifications in current window
- =Time window duration (e.g., 1 hour)
- =Maximum notifications per window
Rate Limit Configuration
- Push notifications: 10 per hour per user
- SMS: 3 per day per user
- Email: 20 per day per user
- Marketing: 5 per week per user (separate from transactional)
Use Redis with sorted sets for sliding window rate limiting. The key is rate_limit:{user_id}:{channel}, and the score is the notification timestamp. Remove entries outside the window before checking the count.
Deduplication
Prevent duplicate notifications using idempotency keys:
Deduplication Check
Here,
- =Unique ID for the notification event
- =Redis key with TTL matching dedup window
Set a reasonable TTL for deduplication keys (e.g., 24 hours). Without TTL, the dedup store grows unbounded. With too short a TTL, duplicates may slip through during retries.
Multi-Channel Delivery
Route notifications to the appropriate channel based on user preferences:
User Preferences:
{
user_id: "u_123",
channels: {
push: { enabled: true, device_tokens: ["tok1", "tok2"] },
sms: { enabled: true, phone: "+1234567890" },
email: { enabled: true, address: "user@example.com" }
},
quiet_hours: { start: "22:00", end: "08:00", timezone: "America/New_York" }
}
Respect user preferences and quiet hours. Check timezone-aware quiet hours before sending push notifications. For SMS and email, comply with regulations (GDPR, CAN-SPAM, TCPA).
Retry and Dead Letter Queue
For failed notifications, implement retry logic:
- Immediate retry: Retry up to 3 times with exponential backoff
- Dead Letter Queue (DLQ): Notifications that fail after all retries
- DLQ processing: Manual review, alerting, or bulk retry
- Fallback channels: If push fails, try email as fallback
Exponential Backoff
Here,
- =Base delay (e.g., 1 second)
- =Current retry attempt number
- =Maximum delay cap (e.g., 60 seconds)
Practice Exercises
-
Design: How would you implement a notification preference center where users can configure channel preferences, quiet hours, and notification types?
-
Scale: If the system needs to send 1M push notifications in 1 minute, estimate the number of worker instances needed assuming each worker can process 500 notifications/second.
-
Reliability: Design a system to guarantee exactly-once notification delivery. What challenges arise from network partitions and provider retries?
-
Analytics: How would you track notification delivery, open rates, and click-through rates? Design the analytics pipeline.
Key Takeaways:
- Notification systems are write-heavy with significant fan-out; use message queues for decoupling
- Rate limiting prevents notification fatigue; implement per-user, per-channel limits
- Deduplication with idempotency keys prevents duplicate notifications during retries
- Multi-channel delivery requires user preference management and quiet hours support
- Dead letter queues and fallback channels ensure reliability
What to Learn Next
-> Message Queues Kafka, RabbitMQ, and async processing patterns.
-> Event-Driven Architecture Event sourcing, CQRS, and asynchronous communication.
-> Rate Limiting Token bucket, sliding window, and distributed rate limiting.
-> Design Email System Email infrastructure, deliverability, and compliance.
-> Design Chat System Real-time messaging with WebSocket and presence.
-> Microservices Service decomposition, communication, and deployment.