πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Distributed Consensus

TheoryDistributed Systems🟒 Free Lesson

Advertisement

Theory

Distributed Consensus

For a distributed system to function correctly, nodes must agree on a single value or sequence of values. Consensus algorithms ensure this agreement despite node failures and network partitions.

  • Agreement β€” All non-faulty nodes decide the same value
  • Validity β€” Decided values were proposed by some node
  • Termination β€” Algorithm eventually terminates

Consensus is the fundamental building block for replicated state machines.

The Consensus Problem

Multiple nodes must agree on a single value, even when some nodes fail or messages are delayed.

DfByzantine Fault Tolerance

A Byzantine fault is any fault where a node can behave arbitrarily β€” sending different values to different nodes, refusing to respond, or actively colluding. A Byzantine Fault Tolerant (BFT) system can reach consensus even if up to f out of 3f+1 nodes are Byzantine faulty.

DfCrash Fault Tolerance

A crash fault is when a node stops responding entirely. A Crash Fault Tolerant (CFT) system can reach consensus if a majority of nodes are alive. Raft and Paxos are crash fault tolerant.

Most distributed systems use crash fault tolerance (CFT) because Byzantine faults are rare in controlled environments. BFT is used in blockchain and adversarial settings where nodes cannot be trusted.

FLP Impossibility

DfFLP Impossibility

The FLP Impossibility result (Fischer, Lynch, Paterson, 1985) proves that in an asynchronous system where even one node can crash, no deterministic consensus protocol can guarantee termination. Practical algorithms circumvent this by using randomization or partial synchrony assumptions.

Paxos

The foundational consensus algorithm, introduced by Leslie Lamport.

DfPaxos

Paxos is a family of consensus protocols where nodes take roles (proposer, acceptor, learner) to agree on a value. The basic version (Single-Decree Paxos) agrees on a single value. Multi-Paxos agrees on a sequence of values by electing a stable leader. Paxos is provably correct but notoriously difficult to implement.

Paxos Phases

Paxos Two-Phase CommitPhase 1: PreparePProposerprepare(n)Acceptorspromise(n)Phase 2: AcceptPaccept(n,v)Acceptorsaccepted(n,v)Majority of acceptors must agree for consensus

Majority Quorum

quorum=⌊n2βŒ‹+1quorum = \lfloor \frac{n}{2} \rfloor + 1

Here,

  • nn=Total number of nodes
  • quorumquorum=Minimum nodes needed for agreement

Paxos Quorum Calculation

With 5 nodes, quorum = floor(5/2) + 1 = 3. Any two quorums overlap by at least 1 node, ensuring that once a value is accepted by a quorum, any subsequent quorum will see it.

Raft

A consensus algorithm designed for understandability.

DfRaft

Raft achieves consensus by electing a leader that manages replication. The leader receives client requests, appends them to its log, and replicates entries to followers. Once a majority confirms an entry, it is committed. Raft decomposes consensus into leader election, log replication, and safety.

Raft Leader Election

Raft Leader ElectionFollowerReceives heartbeatsReplicates log entriesVotes for candidatesCandidateTimeout triggers electionRequests votesWins majority β†’ LeaderLeaderHandles all client requestsSends heartbeatsManages log replicationtimeoutmajorityState transitions: Follower β†’ Candidate β†’ Leader

Raft Properties

PropertyDescription
Election SafetyAt most one leader per term
Leader Append-OnlyLeader never overwrites/deletes entries
Log MatchingIf two logs contain entry with same index and term, all prior entries are identical
Leader CompletenessIf entry committed in term T, present in leader's log for all higher terms
State Machine SafetyIf server applied entry at index i, no other server applies different entry at i

Raft vs Paxos

AspectRaftPaxos
Design goalUnderstandabilityTheoretical optimality
LeaderStrong leader requiredCan work without stable leader
Log structureStrict orderingFlexible
ComplexityModerateHigh
Implementationsetcd, CockroachDBGoogle Chubby, Spanner

ZAB (ZooKeeper Atomic Broadcast)

Used by Apache ZooKeeper for coordination.

DfZAB Protocol

ZAB is a crash fault tolerant consensus protocol used by ZooKeeper. It uses a leader-based approach where the leader processes write requests and broadcasts changes via a two-phase commit. ZAB provides a total ordering of operations, enabling consistent distributed coordination.

ZAB differs from Raft in that it focuses on atomic broadcast (total order of operations) rather than log replication. ZooKeeper uses ZAB to provide primitives like distributed locks, leader election, and configuration management.

Leader Election

A critical subproblem in distributed systems.

DfLeader Election

Leader election is the process of selecting one node from a group to act as coordinator. A leader manages operations, resolves conflicts, and provides a single point of truth. If the leader fails, a new election is triggered. Common algorithms include Bully, Raft, and ZooKeeper-based election.

Election Timeout

timeout=random(150ms,300ms)timeout = random(150ms, 300ms)

Here,

  • timeouttimeout=Random election timeout to avoid split votes

Practice Exercises

  1. Conceptual: Explain why consensus requires a majority quorum. What happens with a 3-node cluster during a network partition?

  2. Raft: Trace through a Raft leader election with 5 nodes. Node 3's election timer fires first. Walk through the message exchanges.

  3. Comparison: Compare Raft and Paxos for a distributed key-value store. When would you choose one over the other?

  4. FLP Impossibility: The FLP result says deterministic consensus is impossible in asynchronous systems with crash failures. How do practical algorithms like Raft circumvent this?

Key Takeaways:

  • Distributed consensus ensures all nodes agree on a single value despite failures
  • Paxos is the foundational algorithm; Raft prioritizes understandability
  • Leader election centralizes coordination, simplifying replication
  • FLP Impossibility proves deterministic consensus cannot guarantee termination in async systems
  • Practical algorithms use randomization or partial synchrony assumptions
  • Used by etcd, CockroachDB, ZooKeeper, and distributed databases

What to Learn Next

-> CAP Theorem Consistency models, availability, and partition tolerance.

-> Data Replication Leader-follower, multi-leader, and conflict resolution.

-> Consistent Hashing Hash rings, virtual nodes, and load distribution.

-> Databases SQL vs NoSQL, indexing, replication, and sharding.

-> Service Mesh Envoy, Istio, and sidecar proxy patterns.

-> Event-Driven Architecture Event sourcing, CQRS, and saga patterns.

⭐

Premium Content

Distributed Consensus

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement