System Design Problems

Design a Key-Value Store

A distributed key-value store provides a simple interface: put(key, value) and get(key). Systems like Redis, DynamoDB, and Memcached form the backbone of modern applications, enabling sub-millisecond data access at massive scale.

Simple Interface — put(key, value) and get(key) operations
Horizontal Scalability — Distribute data across thousands of nodes
Tunable Consistency — Choose between strong and eventual consistency per operation

The elegance of a key-value store lies in its simplicity—but achieving distributed consistency, availability, and performance requires sophisticated engineering.

Requirements

Functional Requirements

put(key, value): Insert or update a key-value pair
get(key): Retrieve the value for a given key
delete(key): Remove a key-value pair
Support for different value types (strings, lists, sets, hashes)
TTL (Time-to-Live) support for key expiration

Non-Functional Requirements

Latency: Read/write in < 1ms (p99)
Availability: 99.99% uptime
Scalability: Billions of keys, millions of QPS
Durability: Data must survive node failures
Consistency: Tunable (strong or eventual)

The CAP theorem directly applies to distributed KV stores. DynamoDB-style stores favor AP (availability + partition tolerance) with eventual consistency. Redis Cluster favors CP (consistency + partition tolerance) with single-threaded consistency per shard.

Back-of-the-Envelope Estimation

Capacity Planning

1 billion keys × 1 KB average value = 1 TB total data
1 million QPS read + 100K QPS write
At 100K writes/sec, write bandwidth = 100 MB/s
At 1M reads/sec, read bandwidth = 1 GB/s
Replication factor of 3: total storage = 3 TB
20 nodes: 50 GB data + 50 MB/s write per node

Core Design Concepts

Data Partitioning

DfConsistent Hashing

Consistent hashing maps both keys and nodes to positions on a hash ring. A key is assigned to the first node found by moving clockwise from its position. When a node is added or removed, only the keys between the adjacent nodes need to be redistributed.

Consistent Hashing

node = \text{clockwise\_nearest}(hash(key), ring)

Here,

$key$ =The data key to look up
$ring$ =The hash ring with node positions
$clockwise_nearest$ =First node found moving clockwise

To avoid uneven data distribution, each physical node is mapped to multiple virtual nodes (vnodes) on the ring. More vnodes = more even distribution.

Replication

For durability and availability, each key is replicated across multiple nodes:

DfReplication Factor

The replication factor (RF) specifies how many copies of each data item are stored. With RF=3, every key exists on 3 different nodes. Read/write quorums ensure consistency.

Quorum Consistency

W + R > N

Here,

$W$ =Write quorum (number of nodes that must acknowledge write)
$R$ =Read quorum (number of nodes contacted for read)
$N$ =Replication factor (total copies)

Quorum Configurations

Strong consistency: W=2, R=2, N=3 → W+R=4 > 3 ✓
High write availability: W=1, R=3, N=3 → W+R=4 > 3 ✓
High read availability: W=3, R=1, N=3 → W+R=4 > 3 ✓
AP system: W=1, R=1, N=3 → W+R=2 < 3 ✗ (eventual consistency)

Conflict Resolution

When concurrent writes reach different replicas, conflicts arise:

Strategy	Description	Use Case
Last-Write-Wins (LWW)	Timestamp determines winner	Simple, low storage
Vector Clocks	Track causal ordering	Complex, precise
CRDTs	Conflict-free data types	Counter, set operations
Application-level	Merge in application code	Custom logic needed

DynamoDB uses vector clocks for conflict detection and last-write-wins for resolution. Redis Cluster uses single-threaded slots, avoiding conflicts entirely within a slot.

Storage Engine

Each node uses a local storage engine:

DfLSM-Tree

A Log-Structured Merge-Tree (LSM-Tree) buffers writes in an in-memory MemTable, then periodically flushes sorted runs to disk as SSTables. Reads check MemTable first, then SSTables via Bloom filters to avoid unnecessary disk I/O.

Gossip Protocol

For cluster membership and failure detection, use a gossip protocol:

DfGossip Protocol

A gossip protocol is a communication protocol where nodes periodically exchange state information with random peers. Each round, a node picks a random peer and shares its view of the cluster. After O(log N) rounds, all nodes converge on the same view.

Gossip Convergence

rounds = O(\log N)

Here,

$N$ =Number of nodes in the cluster

Practice Exercises

Design: How would you implement TTL-based key expiration? Compare lazy expiration vs active expiration approaches.
Trade-offs: Under what conditions would you choose W=1, R=1 (AP) over W=2, R=2 (CP) for a distributed KV store?
Scale: If the KV store needs to handle 10 million keys with 99.9% cache hit ratio, estimate the memory needed assuming 1 KB average value.
Consistency: Explain how vector clocks detect conflicts in a distributed KV store. What happens when two concurrent writes occur on different replicas?

Key Takeaways:

Consistent hashing distributes data evenly across nodes with minimal redistribution on scaling
Quorum (W+R > N) provides tunable consistency between strong and eventual
LSM-Tree storage engines optimize write performance with sequential I/O
Gossip protocols enable decentralized cluster membership and failure detection
Virtual nodes on the hash ring improve load distribution

What to Learn Next

-> Consistent Hashing Deep dive into hash rings, virtual nodes, and load distribution.

-> Databases SQL vs NoSQL, indexing strategies, and storage engines.

-> CAP Theorem Consistency, availability, and partition tolerance trade-offs.

-> Caching Strategies Cache-aside, write-through, and distributed caching patterns.

-> Data Replication Single-leader, multi-leader, and leaderless replication.

-> Design Distributed Cache Building Redis/Memcached at scale with consistent hashing.

Design a Key-Value Store

Design a Key-Value Store

Requirements

Functional Requirements

Non-Functional Requirements

Back-of-the-Envelope Estimation

Capacity Planning

Core Design Concepts

Data Partitioning

DfConsistent Hashing

Consistent Hashing

Replication

DfReplication Factor

Quorum Consistency

Quorum Configurations

Conflict Resolution

Storage Engine

DfLSM-Tree

Gossip Protocol

DfGossip Protocol

Gossip Convergence

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?