🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Design a Key-Value Store

System Design ProblemsDistributed KV Store🟢 Free Lesson

Advertisement

System Design Problems

Design a Key-Value Store

A distributed key-value store provides a simple interface: put(key, value) and get(key). Systems like Redis, DynamoDB, and Memcached form the backbone of modern applications, enabling sub-millisecond data access at massive scale.

  • Simple Interfaceput(key, value) and get(key) operations
  • Horizontal Scalability — Distribute data across thousands of nodes
  • Tunable Consistency — Choose between strong and eventual consistency per operation

The elegance of a key-value store lies in its simplicity—but achieving distributed consistency, availability, and performance requires sophisticated engineering.

Requirements

Functional Requirements

  • put(key, value): Insert or update a key-value pair
  • get(key): Retrieve the value for a given key
  • delete(key): Remove a key-value pair
  • Support for different value types (strings, lists, sets, hashes)
  • TTL (Time-to-Live) support for key expiration

Non-Functional Requirements

  • Latency: Read/write in < 1ms (p99)
  • Availability: 99.99% uptime
  • Scalability: Billions of keys, millions of QPS
  • Durability: Data must survive node failures
  • Consistency: Tunable (strong or eventual)

The CAP theorem directly applies to distributed KV stores. DynamoDB-style stores favor AP (availability + partition tolerance) with eventual consistency. Redis Cluster favors CP (consistency + partition tolerance) with single-threaded consistency per shard.

Back-of-the-Envelope Estimation

Capacity Planning

  • 1 billion keys × 1 KB average value = 1 TB total data
  • 1 million QPS read + 100K QPS write
  • At 100K writes/sec, write bandwidth = 100 MB/s
  • At 1M reads/sec, read bandwidth = 1 GB/s
  • Replication factor of 3: total storage = 3 TB
  • 20 nodes: 50 GB data + 50 MB/s write per node

Core Design Concepts

Data Partitioning

DfConsistent Hashing

Consistent hashing maps both keys and nodes to positions on a hash ring. A key is assigned to the first node found by moving clockwise from its position. When a node is added or removed, only the keys between the adjacent nodes need to be redistributed.

N1N2N3N4N5N6K1K2K3Consistent Hashing Ring with 6 Nodes

Consistent Hashing

node=clockwise_nearest(hash(key),ring)node = \text{clockwise\_nearest}(hash(key), ring)

Here,

  • keykey=The data key to look up
  • ringring=The hash ring with node positions
  • clockwisenearestclockwise_nearest=First node found moving clockwise

To avoid uneven data distribution, each physical node is mapped to multiple virtual nodes (vnodes) on the ring. More vnodes = more even distribution.

Replication

For durability and availability, each key is replicated across multiple nodes:

DfReplication Factor

The replication factor (RF) specifies how many copies of each data item are stored. With RF=3, every key exists on 3 different nodes. Read/write quorums ensure consistency.

Quorum Consistency

W+R>NW + R > N

Here,

  • WW=Write quorum (number of nodes that must acknowledge write)
  • RR=Read quorum (number of nodes contacted for read)
  • NN=Replication factor (total copies)

Quorum Configurations

  • Strong consistency: W=2, R=2, N=3 → W+R=4 > 3 ✓
  • High write availability: W=1, R=3, N=3 → W+R=4 > 3 ✓
  • High read availability: W=3, R=1, N=3 → W+R=4 > 3 ✓
  • AP system: W=1, R=1, N=3 → W+R=2 < 3 ✗ (eventual consistency)

Conflict Resolution

When concurrent writes reach different replicas, conflicts arise:

StrategyDescriptionUse Case
Last-Write-Wins (LWW)Timestamp determines winnerSimple, low storage
Vector ClocksTrack causal orderingComplex, precise
CRDTsConflict-free data typesCounter, set operations
Application-levelMerge in application codeCustom logic needed

DynamoDB uses vector clocks for conflict detection and last-write-wins for resolution. Redis Cluster uses single-threaded slots, avoiding conflicts entirely within a slot.

Storage Engine

Each node uses a local storage engine:

Write PathMemTable (RAM)WAL (Disk)SSTable (Disk)Read PathBloom FilterSSTable IndexBlock CacheLSM-Tree Storage Engine (Write-Optimized)

DfLSM-Tree

A Log-Structured Merge-Tree (LSM-Tree) buffers writes in an in-memory MemTable, then periodically flushes sorted runs to disk as SSTables. Reads check MemTable first, then SSTables via Bloom filters to avoid unnecessary disk I/O.

Gossip Protocol

For cluster membership and failure detection, use a gossip protocol:

DfGossip Protocol

A gossip protocol is a communication protocol where nodes periodically exchange state information with random peers. Each round, a node picks a random peer and shares its view of the cluster. After O(log N) rounds, all nodes converge on the same view.

Gossip Convergence

rounds=O(logN)rounds = O(\log N)

Here,

  • NN=Number of nodes in the cluster

Practice Exercises

  1. Design: How would you implement TTL-based key expiration? Compare lazy expiration vs active expiration approaches.

  2. Trade-offs: Under what conditions would you choose W=1, R=1 (AP) over W=2, R=2 (CP) for a distributed KV store?

  3. Scale: If the KV store needs to handle 10 million keys with 99.9% cache hit ratio, estimate the memory needed assuming 1 KB average value.

  4. Consistency: Explain how vector clocks detect conflicts in a distributed KV store. What happens when two concurrent writes occur on different replicas?

Key Takeaways:

  • Consistent hashing distributes data evenly across nodes with minimal redistribution on scaling
  • Quorum (W+R > N) provides tunable consistency between strong and eventual
  • LSM-Tree storage engines optimize write performance with sequential I/O
  • Gossip protocols enable decentralized cluster membership and failure detection
  • Virtual nodes on the hash ring improve load distribution

What to Learn Next

-> Consistent Hashing Deep dive into hash rings, virtual nodes, and load distribution.

-> Databases SQL vs NoSQL, indexing strategies, and storage engines.

-> CAP Theorem Consistency, availability, and partition tolerance trade-offs.

-> Caching Strategies Cache-aside, write-through, and distributed caching patterns.

-> Data Replication Single-leader, multi-leader, and leaderless replication.

-> Design Distributed Cache Building Redis/Memcached at scale with consistent hashing.

Premium Content

Design a Key-Value Store

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement