System Design Problems
Design a Key-Value Store
A distributed key-value store provides a simple interface: put(key, value) and get(key). Systems like Redis, DynamoDB, and Memcached form the backbone of modern applications, enabling sub-millisecond data access at massive scale.
- Simple Interface —
put(key, value)andget(key)operations - Horizontal Scalability — Distribute data across thousands of nodes
- Tunable Consistency — Choose between strong and eventual consistency per operation
The elegance of a key-value store lies in its simplicity—but achieving distributed consistency, availability, and performance requires sophisticated engineering.
Requirements
Functional Requirements
put(key, value): Insert or update a key-value pairget(key): Retrieve the value for a given keydelete(key): Remove a key-value pair- Support for different value types (strings, lists, sets, hashes)
- TTL (Time-to-Live) support for key expiration
Non-Functional Requirements
- Latency: Read/write in < 1ms (p99)
- Availability: 99.99% uptime
- Scalability: Billions of keys, millions of QPS
- Durability: Data must survive node failures
- Consistency: Tunable (strong or eventual)
The CAP theorem directly applies to distributed KV stores. DynamoDB-style stores favor AP (availability + partition tolerance) with eventual consistency. Redis Cluster favors CP (consistency + partition tolerance) with single-threaded consistency per shard.
Back-of-the-Envelope Estimation
Capacity Planning
- 1 billion keys × 1 KB average value = 1 TB total data
- 1 million QPS read + 100K QPS write
- At 100K writes/sec, write bandwidth = 100 MB/s
- At 1M reads/sec, read bandwidth = 1 GB/s
- Replication factor of 3: total storage = 3 TB
- 20 nodes: 50 GB data + 50 MB/s write per node
Core Design Concepts
Data Partitioning
DfConsistent Hashing
Consistent hashing maps both keys and nodes to positions on a hash ring. A key is assigned to the first node found by moving clockwise from its position. When a node is added or removed, only the keys between the adjacent nodes need to be redistributed.
Consistent Hashing
Here,
- =The data key to look up
- =The hash ring with node positions
- =First node found moving clockwise
To avoid uneven data distribution, each physical node is mapped to multiple virtual nodes (vnodes) on the ring. More vnodes = more even distribution.
Replication
For durability and availability, each key is replicated across multiple nodes:
DfReplication Factor
The replication factor (RF) specifies how many copies of each data item are stored. With RF=3, every key exists on 3 different nodes. Read/write quorums ensure consistency.
Quorum Consistency
Here,
- =Write quorum (number of nodes that must acknowledge write)
- =Read quorum (number of nodes contacted for read)
- =Replication factor (total copies)
Quorum Configurations
- Strong consistency: W=2, R=2, N=3 → W+R=4 > 3 ✓
- High write availability: W=1, R=3, N=3 → W+R=4 > 3 ✓
- High read availability: W=3, R=1, N=3 → W+R=4 > 3 ✓
- AP system: W=1, R=1, N=3 → W+R=2 < 3 ✗ (eventual consistency)
Conflict Resolution
When concurrent writes reach different replicas, conflicts arise:
| Strategy | Description | Use Case |
|---|---|---|
| Last-Write-Wins (LWW) | Timestamp determines winner | Simple, low storage |
| Vector Clocks | Track causal ordering | Complex, precise |
| CRDTs | Conflict-free data types | Counter, set operations |
| Application-level | Merge in application code | Custom logic needed |
DynamoDB uses vector clocks for conflict detection and last-write-wins for resolution. Redis Cluster uses single-threaded slots, avoiding conflicts entirely within a slot.
Storage Engine
Each node uses a local storage engine:
DfLSM-Tree
A Log-Structured Merge-Tree (LSM-Tree) buffers writes in an in-memory MemTable, then periodically flushes sorted runs to disk as SSTables. Reads check MemTable first, then SSTables via Bloom filters to avoid unnecessary disk I/O.
Gossip Protocol
For cluster membership and failure detection, use a gossip protocol:
DfGossip Protocol
A gossip protocol is a communication protocol where nodes periodically exchange state information with random peers. Each round, a node picks a random peer and shares its view of the cluster. After O(log N) rounds, all nodes converge on the same view.
Gossip Convergence
Here,
- =Number of nodes in the cluster
Practice Exercises
-
Design: How would you implement TTL-based key expiration? Compare lazy expiration vs active expiration approaches.
-
Trade-offs: Under what conditions would you choose W=1, R=1 (AP) over W=2, R=2 (CP) for a distributed KV store?
-
Scale: If the KV store needs to handle 10 million keys with 99.9% cache hit ratio, estimate the memory needed assuming 1 KB average value.
-
Consistency: Explain how vector clocks detect conflicts in a distributed KV store. What happens when two concurrent writes occur on different replicas?
Key Takeaways:
- Consistent hashing distributes data evenly across nodes with minimal redistribution on scaling
- Quorum (W+R > N) provides tunable consistency between strong and eventual
- LSM-Tree storage engines optimize write performance with sequential I/O
- Gossip protocols enable decentralized cluster membership and failure detection
- Virtual nodes on the hash ring improve load distribution
What to Learn Next
-> Consistent Hashing Deep dive into hash rings, virtual nodes, and load distribution.
-> Databases SQL vs NoSQL, indexing strategies, and storage engines.
-> CAP Theorem Consistency, availability, and partition tolerance trade-offs.
-> Caching Strategies Cache-aside, write-through, and distributed caching patterns.
-> Data Replication Single-leader, multi-leader, and leaderless replication.
-> Design Distributed Cache Building Redis/Memcached at scale with consistent hashing.