System Design - Theory
CAP Theorem
The CAP theorem is one of the most fundamental results in distributed systems theory. It defines the inherent trade-offs every distributed system must make, and understanding it is essential for making informed architectural decisions.
- Consistency - Every read returns the most recent write
- Availability - Every request receives a response
- Partition Tolerance - System continues operating despite network failures
Distributed systems are not about choosing the right answer - they are about choosing the right trade-off.
The CAP Theorem
DfCAP Theorem
The CAP Theorem (Brewer, 2000; Gilbert and Lynch, 2002) states that a distributed data store can provide at most two of the following three guarantees simultaneously:
- Consistency (C): Every read receives the most recent write or an error
- Availability (A): Every request receives a non-error response (without guaranteeing it is the most recent write)
- Partition Tolerance (P): The system continues to operate despite network partitions (message loss or delay between nodes)
Since network partitions are inevitable in distributed systems, the practical choice is between CP (consistency + partition tolerance) and AP (availability + partition tolerance).
What Each Guarantee Means
Consistency
DfLinearizability (Strong Consistency)
A system is linearizable if every operation appears to take effect atomically at some point between its invocation and response. Once a write completes, all subsequent reads (from any node) must return that value or a later value. This is the strongest single-object consistency model.
Availability
Every non-failing node must return a response. The system cannot refuse to serve requests, even if some nodes are down or unreachable.
Partition Tolerance
The system must continue to function when network partitions occur. Since networks are inherently unreliable, partition tolerance is not optional in practice.
Why You Cannot Have All Three
Consider two nodes (N1, N2) with a network partition between them. A client writes to N1. When a client reads from N2:
- If N2 responds (A), it must return stale data (not C)
- If N2 waits for N1 to sync (C), it cannot respond (not A)
This is the fundamental tension. During normal operation (no partition), you can have both C and A. The theorem applies during partitions.
Real-World Systems by CAP Choice
CP Systems (Consistency + Partition Tolerance)
| System | Description |
|---|---|
| HBase | Distributed column-store, strong consistency |
| MongoDB (default) | Document database with majority write concern |
| ZooKeeper | Distributed coordination, consensus-based |
| etcd | Key-value store using Raft consensus |
| Google Spanner | Globally distributed, externally consistent |
AP Systems (Availability + Partition Tolerance)
| System | Description |
|---|---|
| Cassandra | Distributed wide-column, eventual consistency |
| DynamoDB | AWS key-value store, eventually consistent reads |
| CouchDB | Document database with multi-master replication |
| Riak | Distributed key-value, eventually consistent |
| DNS | Eventually consistent, highly available |
CA Systems (Consistency + Availability)
True CA systems only exist in single-node deployments or when partitions cannot occur. In distributed systems, partitions are inevitable, so CA is not a practical choice. Traditional RDBMS (PostgreSQL, MySQL) are CA when running on a single node, but become CP or AP when distributed.
Consistency Models
DfConsistency Model
A consistency model defines the rules for when and how reads return values written by other operations. It exists on a spectrum from strong to weak consistency.
Consistency Spectrum
| Model | Description | Systems |
|---|---|---|
| Linearizable | Strongest; reads return latest write | Spanner, ZooKeeper |
| Sequential | Operations appear in some sequential order | Raft, Paxos |
| Causal | Preserves causal relationships | MongoDB (causal sessions) |
| Eventual | All replicas converge eventually | Cassandra, DynamoDB |
| Weak | No ordering guarantees | DNS, some NoSQL defaults |
PACELC Theorem
The PACELC theorem extends CAP by considering what happens when there is no partition.
DfPACELC Theorem
PACELC (Abadi, 2012) states: if there is a Partition, choose between Availability and Consistency; Else (when running normally), choose between Latency and Consistency. This captures the full spectrum of distributed system trade-offs.
PACELC Classification
| System | Partition | Else | Classification |
|---|---|---|---|
| Cassandra | A | L (latency) | EL |
| DynamoDB | A | L | EL |
| MongoDB | C | C (strong consistency) | PC/EC |
| HBase | C | C | PC/EC |
| CockroachDB | C | C | PC/EC |
| Cosmos DB | Configurable | Configurable | Configurable |
PACELC is more practical than CAP for real-world decisions. Most systems are not in a partition state, so the "else" trade-off (latency vs consistency) matters more day-to-day. A system that always chooses consistency pays the latency cost even when the network is healthy.
Eventual Consistency
DfEventual Consistency
Eventual consistency is a consistency model where, if no new updates are made to a given data item, all replicas will eventually return the last written value. It provides high availability and low latency at the cost of temporary inconsistency.
Convergence Time
Eventual Convergence
Here,
- =Time of the last write
- =Current time (t > t_0)
- =Read from replica i at time t
Conflict Resolution
When replicas diverge, conflicts must be resolved:
| Strategy | Description | Trade-off |
|---|---|---|
| Last-Writer-Wins (LWW) | Timestamp-based, latest wins | Simple, may lose writes |
| Vector Clocks | Track causal ordering | Complex, requires metadata |
| CRDTs | Conflict-free replicated data types | Automatically convergent |
| Application logic | Business rules resolve conflicts | Flexible, but complex |
DfCRDTs
Conflict-free Replicated Data Types (CRDTs) are data structures that can be replicated across multiple nodes and merged without coordination. They guarantee eventual consistency regardless of the order in which operations are applied. Examples include G-Counters, PN-Counters, G-Sets, OR-Sets, and LWW-Registers.
Practice Exercises
-
Analysis: For each of the following systems, classify as CP, AP, or CA and explain your reasoning: (a) a banking system, (b) a social media feed, (c) a DNS system, (d) a stock trading platform.
-
Trade-offs: You are designing a global user profile service. Users can update their profile from any region. Which CAP choice would you make? How does PACELC influence your decision?
-
Design: Design a distributed counter that supports concurrent increments from multiple regions. What consistency model would you use? How do you handle conflicts?
-
Theory: Prove that during a network partition, a system cannot simultaneously guarantee both consistency and availability. Use a two-node example.
Key Takeaways:
- The CAP theorem proves that during network partitions, you must choose between consistency and availability
- Network partitions are inevitable, so true CA systems cannot exist in distributed environments
- PACELC extends CAP: even without partitions, you must trade latency for consistency
- Consistency models range from linearizable (strong) to eventual (weak) with corresponding performance trade-offs
- CRDTs provide mathematically guaranteed eventual consistency without coordination
- The right CAP choice depends on the specific requirements of each component in your system
What to Learn Next
-> Introduction to System Design Core principles, the design process, and trade-offs.
-> Databases SQL vs NoSQL, indexing, replication, and sharding.
-> Caching Strategies Redis, Memcached, cache invalidation, and write strategies.
-> Microservices Service decomposition, discovery, and API gateways.
-> Scalability Fundamentals Vertical vs horizontal scaling and capacity planning.
-> Load Balancing Algorithms, health checks, and L4 vs L7.