System Design - Theory

CAP Theorem

The CAP theorem is one of the most fundamental results in distributed systems theory. It defines the inherent trade-offs every distributed system must make, and understanding it is essential for making informed architectural decisions.

Consistency - Every read returns the most recent write
Availability - Every request receives a response
Partition Tolerance - System continues operating despite network failures

Distributed systems are not about choosing the right answer - they are about choosing the right trade-off.

The CAP Theorem

DfCAP Theorem

The CAP Theorem (Brewer, 2000; Gilbert and Lynch, 2002) states that a distributed data store can provide at most two of the following three guarantees simultaneously:

Consistency (C): Every read receives the most recent write or an error
Availability (A): Every request receives a non-error response (without guaranteeing it is the most recent write)
Partition Tolerance (P): The system continues to operate despite network partitions (message loss or delay between nodes)

Since network partitions are inevitable in distributed systems, the practical choice is between CP (consistency + partition tolerance) and AP (availability + partition tolerance).

What Each Guarantee Means

Consistency

DfLinearizability (Strong Consistency)

A system is linearizable if every operation appears to take effect atomically at some point between its invocation and response. Once a write completes, all subsequent reads (from any node) must return that value or a later value. This is the strongest single-object consistency model.

Availability

Every non-failing node must return a response. The system cannot refuse to serve requests, even if some nodes are down or unreachable.

Partition Tolerance

The system must continue to function when network partitions occur. Since networks are inherently unreliable, partition tolerance is not optional in practice.

Why You Cannot Have All Three

Consider two nodes (N1, N2) with a network partition between them. A client writes to N1. When a client reads from N2:

If N2 responds (A), it must return stale data (not C)
If N2 waits for N1 to sync (C), it cannot respond (not A)

This is the fundamental tension. During normal operation (no partition), you can have both C and A. The theorem applies during partitions.

Real-World Systems by CAP Choice

CP Systems (Consistency + Partition Tolerance)

System	Description
HBase	Distributed column-store, strong consistency
MongoDB (default)	Document database with majority write concern
ZooKeeper	Distributed coordination, consensus-based
etcd	Key-value store using Raft consensus
Google Spanner	Globally distributed, externally consistent

AP Systems (Availability + Partition Tolerance)

System	Description
Cassandra	Distributed wide-column, eventual consistency
DynamoDB	AWS key-value store, eventually consistent reads
CouchDB	Document database with multi-master replication
Riak	Distributed key-value, eventually consistent
DNS	Eventually consistent, highly available

CA Systems (Consistency + Availability)

True CA systems only exist in single-node deployments or when partitions cannot occur. In distributed systems, partitions are inevitable, so CA is not a practical choice. Traditional RDBMS (PostgreSQL, MySQL) are CA when running on a single node, but become CP or AP when distributed.

Consistency Models

DfConsistency Model

A consistency model defines the rules for when and how reads return values written by other operations. It exists on a spectrum from strong to weak consistency.

Consistency Spectrum

Model	Description	Systems
Linearizable	Strongest; reads return latest write	Spanner, ZooKeeper
Sequential	Operations appear in some sequential order	Raft, Paxos
Causal	Preserves causal relationships	MongoDB (causal sessions)
Eventual	All replicas converge eventually	Cassandra, DynamoDB
Weak	No ordering guarantees	DNS, some NoSQL defaults

PACELC Theorem

The PACELC theorem extends CAP by considering what happens when there is no partition.

DfPACELC Theorem

PACELC (Abadi, 2012) states: if there is a Partition, choose between Availability and Consistency; Else (when running normally), choose between Latency and Consistency. This captures the full spectrum of distributed system trade-offs.

PACELC Classification

System	Partition	Else	Classification
Cassandra	A	L (latency)	EL
DynamoDB	A	L	EL
MongoDB	C	C (strong consistency)	PC/EC
HBase	C	C	PC/EC
CockroachDB	C	C	PC/EC
Cosmos DB	Configurable	Configurable	Configurable

PACELC is more practical than CAP for real-world decisions. Most systems are not in a partition state, so the "else" trade-off (latency vs consistency) matters more day-to-day. A system that always chooses consistency pays the latency cost even when the network is healthy.

Eventual Consistency

DfEventual Consistency

Eventual consistency is a consistency model where, if no new updates are made to a given data item, all replicas will eventually return the last written value. It provides high availability and low latency at the cost of temporary inconsistency.

Convergence Time

Eventual Convergence

\lim_{t \\to \infty} P(\text{read}_i(t) = \text{write}(t_0)) = 1

Here,

$t_0$ =Time of the last write
$t$ =Current time (t > t_0)
$read_i(t)$ =Read from replica i at time t

Conflict Resolution

When replicas diverge, conflicts must be resolved:

Strategy	Description	Trade-off
Last-Writer-Wins (LWW)	Timestamp-based, latest wins	Simple, may lose writes
Vector Clocks	Track causal ordering	Complex, requires metadata
CRDTs	Conflict-free replicated data types	Automatically convergent
Application logic	Business rules resolve conflicts	Flexible, but complex

DfCRDTs

Conflict-free Replicated Data Types (CRDTs) are data structures that can be replicated across multiple nodes and merged without coordination. They guarantee eventual consistency regardless of the order in which operations are applied. Examples include G-Counters, PN-Counters, G-Sets, OR-Sets, and LWW-Registers.

Practice Exercises

Analysis: For each of the following systems, classify as CP, AP, or CA and explain your reasoning: (a) a banking system, (b) a social media feed, (c) a DNS system, (d) a stock trading platform.
Trade-offs: You are designing a global user profile service. Users can update their profile from any region. Which CAP choice would you make? How does PACELC influence your decision?
Design: Design a distributed counter that supports concurrent increments from multiple regions. What consistency model would you use? How do you handle conflicts?
Theory: Prove that during a network partition, a system cannot simultaneously guarantee both consistency and availability. Use a two-node example.

Key Takeaways:

The CAP theorem proves that during network partitions, you must choose between consistency and availability
Network partitions are inevitable, so true CA systems cannot exist in distributed environments
PACELC extends CAP: even without partitions, you must trade latency for consistency
Consistency models range from linearizable (strong) to eventual (weak) with corresponding performance trade-offs
CRDTs provide mathematically guaranteed eventual consistency without coordination
The right CAP choice depends on the specific requirements of each component in your system

What to Learn Next

-> Introduction to System Design Core principles, the design process, and trade-offs.

-> Databases SQL vs NoSQL, indexing, replication, and sharding.

-> Caching Strategies Redis, Memcached, cache invalidation, and write strategies.

-> Microservices Service decomposition, discovery, and API gateways.

-> Scalability Fundamentals Vertical vs horizontal scaling and capacity planning.

-> Load Balancing Algorithms, health checks, and L4 vs L7.

CAP Theorem

CAP Theorem

The CAP Theorem

DfCAP Theorem

What Each Guarantee Means

Consistency

DfLinearizability (Strong Consistency)

Availability

Partition Tolerance

Why You Cannot Have All Three

Real-World Systems by CAP Choice

CP Systems (Consistency + Partition Tolerance)

AP Systems (Availability + Partition Tolerance)

CA Systems (Consistency + Availability)

Consistency Models

DfConsistency Model

Consistency Spectrum

PACELC Theorem

DfPACELC Theorem

PACELC Classification

Eventual Consistency

DfEventual Consistency

Convergence Time

Eventual Convergence

Conflict Resolution

DfCRDTs

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?