System Design Problems
Design a Real-time Analytics System
A real-time analytics system ingests high-volume event streams and provides sub-second query responses for dashboards, A/B testing, and business intelligence. Systems like Apache Kafka, Flink, and ClickPower enable real-time data analysis at massive scale.
- Stream Processing β Process millions of events per second in real-time
- Sub-second Queries β Dashboard responses in < 1 second
- Time-series Optimization β Data partitioned by time for efficient range queries
The challenge is balancing write throughput (ingest millions of events/sec) with read latency (query results in < 1 second) across time-partitioned data.
Requirements
Functional Requirements
- Ingest events from multiple sources (clickstream, IoT, logs)
- Store events with timestamps for time-range queries
- Support real-time aggregations (counts, sums, averages, percentiles)
- Dashboard queries with < 1 second latency
- Support for ad-hoc queries on recent data (last 24 hours)
- Historical data available for batch queries (last 90 days)
Non-Functional Requirements
- Write Throughput: 1M events/second
- Query Latency: < 1 second for dashboard queries
- Data Retention: 90 days hot, 1 year cold
- Accuracy: Exact for counts, approximate for percentiles (HyperLogLog)
- Availability: 99.99%
Real-time analytics systems are write-heavy with relatively few reads. The key insight is to pre-aggregate data at write time (materialized views) rather than computing aggregations at query time.
Back-of-the-Envelope Estimation
Analytics System Capacity
- 1M events/sec Γ 86400 sec = 86.4B events/day
- Average event size: 500 bytes
- Daily storage: 86.4B Γ 500 bytes = 43.2 TB/day
- 90-day hot storage: ~3.9 PB
- Query QPS: 10K (dashboard + ad-hoc)
- Dashboard refresh rate: 10 sec intervals
High-Level Architecture
Detailed Design
Lambda Architecture
The Lambda architecture combines batch and stream processing:
DfLambda Architecture
The Lambda architecture processes data through two paths: a batch layer that processes all historical data for completeness, and a speed layer that processes recent data for real-time results. A serving layer merges both outputs.
| Layer | Purpose | Technology |
|---|---|---|
| Batch Layer | Complete, accurate results over all data | Spark, Hive |
| Speed Layer | Low-latency results over recent data | Flink, Kafka Streams |
| Serving Layer | Merged view for queries | ClickHouse, Druid |
The Lambda architecture is being replaced by the Kappa architecture in many modern systems. Kappa uses only stream processing (no batch), simplifying the architecture.
Pre-aggregation Strategy
Pre-aggregate data at write time to enable fast queries:
DfPre-aggregation
Instead of storing raw events and computing aggregations at query time, compute aggregations in real-time as events arrive. Store pre-computed aggregates (counts, sums, averages) in materialized views.
Pre-aggregation Storage
Here,
- =Number of unique dimension combinations
- =Number of time buckets (e.g., 1440 minutes/day)
- =Number of metrics per aggregation
Pre-aggregation Savings
Raw events: 1M events/sec Γ 500 bytes Γ 86400 = 43.2 TB/day
Pre-aggregated (minute-level, 100 dimension combinations): 100 Γ 1440 minutes Γ 5 metrics Γ 8 bytes Γ 86400 = ~5 GB/day
Pre-aggregation reduces storage by 99.99% for aggregate queries.
OLAP Database (ClickHouse)
DfColumn-oriented Storage
Column-oriented databases store data by column rather than by row. This enables efficient aggregation queries (SUM, COUNT, AVG) because only the relevant columns are read from disk.
ClickHouse uses LZ4 compression on column data, achieving 10:1 compression ratios. A 43 TB/day raw event stream becomes ~4 TB compressed in ClickHouse.
Time-series Partitioning
Partition data by time for efficient range queries:
Time Partitioning
Here,
- =Event timestamp
- =Partition granularity
Partition by day for recent data (hot) and by month for historical data (cold). Drop old partitions for data retention management.
Approximate Algorithms
For high-cardinality analytics, use approximate algorithms:
| Algorithm | Purpose | Error | Memory |
|---|---|---|---|
| HyperLogLog | Distinct count | ~0.8% | 12 KB |
| Count-Min Sketch | Frequency estimation | Varies | Fixed |
| T-Digest | Percentile estimation | < 1% | 100 KB |
HyperLogLog Distinct Count
Here,
- =Number of registers (buckets)
- =Maximum leading zeros in register j
- =Bias correction constant
HyperLogLog Efficiency
To count 1 billion distinct user IDs:
- Exact counting: 1 billion Γ 8 bytes = 8 GB
- HyperLogLog: 12 KB with ~0.8% error
HyperLogLog reduces memory by 99.998% with negligible accuracy loss.
Practice Exercises
-
Design: How would you implement a real-time dashboard that shows the number of active users in the last 5 minutes? What data structure would you use?
-
Scale: If the system ingests 1M events/second, estimate the Kafka partition count needed and the ClickHouse cluster size for 90-day retention.
-
Consistency: In a Lambda architecture, how do you handle the transition from batch-computed results to stream-computed results? Design a merge strategy.
-
Optimization: How would you optimize ClickHouse queries for a dashboard that queries 10 different time ranges and 5 different dimensions simultaneously?
Key Takeaways:
- Lambda architecture combines batch (completeness) and stream (speed) processing
- Pre-aggregation at write time reduces query storage by 99%+
- Column-oriented databases (ClickHouse) are 10-100x faster for aggregation queries
- HyperLogLog provides distinct counts with 0.8% error at 12 KB memory
- Time-series partitioning enables efficient range queries and data lifecycle management
What to Learn Next
-> Design Metrics Monitoring Collecting and querying system metrics at scale.
-> Message Queues Kafka for event streaming and partitioning strategies.
-> Event-Driven Architecture Event sourcing and CQRS patterns.
-> Databases Column-oriented vs row-oriented storage trade-offs.
-> Caching Strategies Caching pre-aggregated results for dashboard queries.
-> Design Realtime Analytics This article (stream processing deep dive).