System Design — Infrastructure

Load Balancing

Load balancing distributes incoming traffic across multiple servers to ensure high availability, reliability, and optimal resource utilization. It's one of the most critical components in any scalable system.

Algorithms — Round-robin, least connections, consistent hashing
L4 vs L7 — Transport vs application layer balancing
Health Checks — Detecting and removing unhealthy servers

If you have more than one server, you need a load balancer.

What Is Load Balancing?

DfLoad Balancing

Load balancing is the process of distributing network traffic across multiple backend servers (targets) to ensure no single server bears a disproportionate share of the load. A load balancer acts as a reverse proxy, routing requests to healthy servers based on a configurable distribution algorithm.

Load Balancer Placement

Load Balancing Algorithms

Round Robin

Distributes requests sequentially across servers.

Round Robin

\text{server}_i = \text{servers}[i \\mod N]

Here,

$i$ =Request counter (incrementing)
$N$ =Number of servers

Pros: Simple, even distribution for uniform workloads Cons: Ignores server capacity and current load

Weighted Round Robin

Assigns weights proportional to server capacity.

Weighted Round Robin

\text{weight}_i = \frac{\text{capacity}_i}{\sum_{j=1}^{N} \text{capacity}_j}

Here,

$capacity_i$ =Capacity of server i (CPU, memory, or custom metric)
$N$ =Total number of servers

Least Connections

Routes to the server with fewest active connections.

Least Connections

\text{server} = \\arg\min_{i} \\{\text{active\_connections}_i\\}

Here,

$active_connections_i$ =Number of active connections to server i

Pros: Adapts to varying request durations Cons: Requires tracking connection state, doesn't account for connection weight

Consistent Hashing

Maps servers and requests to a hash ring, minimizing redistribution when servers are added or removed.

DfConsistent Hashing

Consistent hashing maps both servers and keys to positions on a ring (0 to 2^32-1). A key is assigned to the first server encountered walking clockwise. When a server is added/removed, only keys between it and its neighbors are redistributed, minimizing data movement.

Pros: Minimal redistribution on scaling, works well with caches Cons: May be uneven without virtual nodes

Least Response Time

Routes to the server with the lowest average response time and fewest active connections.

Least Response Time

\text{server} = \\arg\min_{i} \\{\text{avg\_response\_time}_i + \alpha \times \text{active\_connections}_i\\}

Here,

$avg_response_time_i$ =Average response time for server i
$\alpha$ =Weight factor for connection count
$active_connections_i$ =Active connections to server i

L4 vs L7 Load Balancing

DfL4 Load Balancer

A Layer 4 (L4) load balancer operates at the transport layer (TCP/UDP). It makes routing decisions based on IP address and port numbers without inspecting the content of the request. L4 is faster (no payload inspection) but less flexible.

DfL7 Load Balancer

A Layer 7 (L7) load balancer operates at the application layer (HTTP/HTTPS). It can inspect the full request—URL, headers, cookies, body—and make routing decisions based on application-level information. L7 is more flexible but introduces more latency.

Feature	L4	L7
OSI Layer	Transport (TCP/UDP)	Application (HTTP)
Routing basis	IP, port	URL, headers, cookies
Throughput	Higher (no payload inspection)	Lower (full inspection)
SSL termination	Can terminate	Typically terminates
Content-based routing	No	Yes (URL, header)
WebSocket support	Yes (pass-through)	Yes (with upgrade)
Use case	High-throughput, simple routing	Content routing, SSL offload

Modern load balancers like Envoy, HAProxy, and AWS ALB support both L4 and L7 modes. Choose L7 when you need content-based routing (e.g., /api → service A, /static → CDN), WebSocket support, or request-level health checks.

Health Checks

Load balancers must detect and remove unhealthy servers.

Health Check Types

Type	Description	Granularity
TCP	Can establish TCP connection	Port-level
HTTP	HTTP 200 response from health endpoint	Application-level
gRPC	gRPC health check protocol	Service-level
Custom	Application-specific check	Business-level

Health Check Configuration

Architecture Diagram

Health Check Parameters:
- Interval: 10 seconds (how often to check)
- Timeout: 5 seconds (max wait per check)
- Healthy threshold: 2 (consecutive successes to mark healthy)
- Unhealthy threshold: 3 (consecutive failures to mark unhealthy)

Health checks should verify the full dependency chain—not just "is the process running?" but "can the process serve requests?" A health endpoint should check database connectivity, cache availability, and disk space.

Global Server Load Balancing (GSLB)

For multi-region deployments, GSLB distributes traffic across data centers.

DfGSLB

Global Server Load Balancing distributes traffic across geographically dispersed data centers. It uses DNS-based or anycast-based routing to direct users to the optimal data center based on proximity, load, and health.

GSLB Strategies

Strategy	How It Works
GeoDNS	Resolves to nearest data center by IP geolocation
Anycast	Multiple data centers share same IP; BGP routes to nearest
Latency-based	Routes to data center with lowest measured latency
Weighted	Distributes traffic by configurable weights

Practice Exercises

Design: Design a load balancing strategy for a real-time gaming platform with 100M concurrent users. Consider: sticky sessions, geographic distribution, failover, and latency requirements.
Algorithm Selection: You have 5 servers with different capacities (2x, 1x, 1x, 1x, 0.5x). Which load balancing algorithm would you use? How would you configure it?
Architecture: Compare L4 and L7 load balancing for: (a) a TCP-based database proxy, (b) an API gateway with content-based routing, (c) a WebSocket chat service.
Troubleshooting: A load balancer is distributing traffic unevenly—server 1 gets 40% while others get 20% each. What could cause this? How would you diagnose and fix it?

Key Takeaways:

Load balancing distributes traffic for high availability and optimal resource utilization
Round-robin is simple but ignores load; least connections adapts but requires state tracking
Consistent hashing minimizes redistribution when servers are added or removed
L4 balancing is faster (transport layer); L7 balancing is smarter (application layer)
Health checks must verify the full dependency chain, not just process availability
GSLB distributes traffic across regions using DNS, anycast, or latency-based routing

What to Learn Next

-> Message Queues Kafka, RabbitMQ, event-driven architecture.

-> Microservices Service decomposition, discovery, and API gateways.

-> CAP Theorem Consistency models, availability, and partition tolerance.

-> Scalability Fundamentals Vertical vs horizontal scaling and capacity planning.

-> Databases SQL vs NoSQL, indexing, replication, and sharding.

-> API Design REST, GraphQL, gRPC, versioning, and rate limiting.

Load Balancing

Load Balancing

What Is Load Balancing?

DfLoad Balancing

Load Balancer Placement

Load Balancing Algorithms

Round Robin

Round Robin

Weighted Round Robin

Weighted Round Robin

Least Connections

Least Connections

Consistent Hashing

DfConsistent Hashing

Least Response Time

Least Response Time

L4 vs L7 Load Balancing

DfL4 Load Balancer

DfL7 Load Balancer

Health Checks

Health Check Types

Health Check Configuration

Global Server Load Balancing (GSLB)

DfGSLB

GSLB Strategies

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?