System Design Problems

Design a Real-time Analytics System

A real-time analytics system ingests high-volume event streams and provides sub-second query responses for dashboards, A/B testing, and business intelligence. Systems like Apache Kafka, Flink, and ClickPower enable real-time data analysis at massive scale.

Stream Processing — Process millions of events per second in real-time
Sub-second Queries — Dashboard responses in < 1 second
Time-series Optimization — Data partitioned by time for efficient range queries

The challenge is balancing write throughput (ingest millions of events/sec) with read latency (query results in < 1 second) across time-partitioned data.

Requirements

Functional Requirements

Ingest events from multiple sources (clickstream, IoT, logs)
Store events with timestamps for time-range queries
Support real-time aggregations (counts, sums, averages, percentiles)
Dashboard queries with < 1 second latency
Support for ad-hoc queries on recent data (last 24 hours)
Historical data available for batch queries (last 90 days)

Non-Functional Requirements

Write Throughput: 1M events/second
Query Latency: < 1 second for dashboard queries
Data Retention: 90 days hot, 1 year cold
Accuracy: Exact for counts, approximate for percentiles (HyperLogLog)
Availability: 99.99%

Real-time analytics systems are write-heavy with relatively few reads. The key insight is to pre-aggregate data at write time (materialized views) rather than computing aggregations at query time.

Back-of-the-Envelope Estimation

Analytics System Capacity

1M events/sec × 86400 sec = 86.4B events/day
Average event size: 500 bytes
Daily storage: 86.4B × 500 bytes = 43.2 TB/day
90-day hot storage: ~3.9 PB
Query QPS: 10K (dashboard + ad-hoc)
Dashboard refresh rate: 10 sec intervals

High-Level Architecture

Detailed Design

Lambda Architecture

The Lambda architecture combines batch and stream processing:

DfLambda Architecture

The Lambda architecture processes data through two paths: a batch layer that processes all historical data for completeness, and a speed layer that processes recent data for real-time results. A serving layer merges both outputs.

Layer	Purpose	Technology
Batch Layer	Complete, accurate results over all data	Spark, Hive
Speed Layer	Low-latency results over recent data	Flink, Kafka Streams
Serving Layer	Merged view for queries	ClickHouse, Druid

The Lambda architecture is being replaced by the Kappa architecture in many modern systems. Kappa uses only stream processing (no batch), simplifying the architecture.

Pre-aggregation Strategy

Pre-aggregate data at write time to enable fast queries:

DfPre-aggregation

Instead of storing raw events and computing aggregations at query time, compute aggregations in real-time as events arrive. Store pre-computed aggregates (counts, sums, averages) in materialized views.

Pre-aggregation Storage

storage_{pre} = N_{dimensions} \times N_{time\_buckets} \times N_{metrics} \times 8\,bytes

Here,

$N_{dimensions}$ =Number of unique dimension combinations
$N_{time_buckets}$ =Number of time buckets (e.g., 1440 minutes/day)
$N_{metrics}$ =Number of metrics per aggregation

Pre-aggregation Savings

Raw events: 1M events/sec × 500 bytes × 86400 = 43.2 TB/day

Pre-aggregated (minute-level, 100 dimension combinations): 100 × 1440 minutes × 5 metrics × 8 bytes × 86400 = ~5 GB/day

Pre-aggregation reduces storage by 99.99% for aggregate queries.

OLAP Database (ClickHouse)

DfColumn-oriented Storage

Column-oriented databases store data by column rather than by row. This enables efficient aggregation queries (SUM, COUNT, AVG) because only the relevant columns are read from disk.

ClickHouse uses LZ4 compression on column data, achieving 10:1 compression ratios. A 43 TB/day raw event stream becomes ~4 TB compressed in ClickHouse.

Time-series Partitioning

Partition data by time for efficient range queries:

Time Partitioning

partition = \text{date\_trunc}(timestamp, \text{day})

Here,

$timestamp$ =Event timestamp
$day$ =Partition granularity

Partition by day for recent data (hot) and by month for historical data (cold). Drop old partitions for data retention management.

Approximate Algorithms

For high-cardinality analytics, use approximate algorithms:

Algorithm	Purpose	Error	Memory
HyperLogLog	Distinct count	~0.8%	12 KB
Count-Min Sketch	Frequency estimation	Varies	Fixed
T-Digest	Percentile estimation	< 1%	100 KB

HyperLogLog Distinct Count

E = \alpha_m \times m^2 \times (\sum_{j=1}^{m} 2^{-M_j})^{-1}

Here,

$m$ =Number of registers (buckets)
$M_j$ =Maximum leading zeros in register j
$alpha_m$ =Bias correction constant

HyperLogLog Efficiency

To count 1 billion distinct user IDs:

Exact counting: 1 billion × 8 bytes = 8 GB
HyperLogLog: 12 KB with ~0.8% error

HyperLogLog reduces memory by 99.998% with negligible accuracy loss.

Practice Exercises

Design: How would you implement a real-time dashboard that shows the number of active users in the last 5 minutes? What data structure would you use?
Scale: If the system ingests 1M events/second, estimate the Kafka partition count needed and the ClickHouse cluster size for 90-day retention.
Consistency: In a Lambda architecture, how do you handle the transition from batch-computed results to stream-computed results? Design a merge strategy.
Optimization: How would you optimize ClickHouse queries for a dashboard that queries 10 different time ranges and 5 different dimensions simultaneously?

Key Takeaways:

Lambda architecture combines batch (completeness) and stream (speed) processing
Pre-aggregation at write time reduces query storage by 99%+
Column-oriented databases (ClickHouse) are 10-100x faster for aggregation queries
HyperLogLog provides distinct counts with 0.8% error at 12 KB memory
Time-series partitioning enables efficient range queries and data lifecycle management

What to Learn Next

-> Design Metrics Monitoring Collecting and querying system metrics at scale.

-> Message Queues Kafka for event streaming and partitioning strategies.

-> Event-Driven Architecture Event sourcing and CQRS patterns.

-> Databases Column-oriented vs row-oriented storage trade-offs.

-> Caching Strategies Caching pre-aggregated results for dashboard queries.

-> Design Realtime Analytics This article (stream processing deep dive).

Design a Real-time Analytics System

Design a Real-time Analytics System

Requirements

Functional Requirements

Non-Functional Requirements

Back-of-the-Envelope Estimation

Analytics System Capacity

High-Level Architecture

Detailed Design

Lambda Architecture

DfLambda Architecture

Pre-aggregation Strategy

DfPre-aggregation

Pre-aggregation Storage

Pre-aggregation Savings

OLAP Database (ClickHouse)

DfColumn-oriented Storage

Time-series Partitioning

Time Partitioning

Approximate Algorithms

HyperLogLog Distinct Count

HyperLogLog Efficiency

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?