System Design — Infrastructure
Load Balancing
Load balancing distributes incoming traffic across multiple servers to ensure high availability, reliability, and optimal resource utilization. It's one of the most critical components in any scalable system.
- Algorithms — Round-robin, least connections, consistent hashing
- L4 vs L7 — Transport vs application layer balancing
- Health Checks — Detecting and removing unhealthy servers
If you have more than one server, you need a load balancer.
What Is Load Balancing?
DfLoad Balancing
Load balancing is the process of distributing network traffic across multiple backend servers (targets) to ensure no single server bears a disproportionate share of the load. A load balancer acts as a reverse proxy, routing requests to healthy servers based on a configurable distribution algorithm.
Load Balancer Placement
Load Balancing Algorithms
Round Robin
Distributes requests sequentially across servers.
Round Robin
Here,
- =Request counter (incrementing)
- =Number of servers
Pros: Simple, even distribution for uniform workloads Cons: Ignores server capacity and current load
Weighted Round Robin
Assigns weights proportional to server capacity.
Weighted Round Robin
Here,
- =Capacity of server i (CPU, memory, or custom metric)
- =Total number of servers
Least Connections
Routes to the server with fewest active connections.
Least Connections
Here,
- =Number of active connections to server i
Pros: Adapts to varying request durations Cons: Requires tracking connection state, doesn't account for connection weight
Consistent Hashing
Maps servers and requests to a hash ring, minimizing redistribution when servers are added or removed.
DfConsistent Hashing
Consistent hashing maps both servers and keys to positions on a ring (0 to 2^32-1). A key is assigned to the first server encountered walking clockwise. When a server is added/removed, only keys between it and its neighbors are redistributed, minimizing data movement.
Pros: Minimal redistribution on scaling, works well with caches Cons: May be uneven without virtual nodes
Least Response Time
Routes to the server with the lowest average response time and fewest active connections.
Least Response Time
Here,
- =Average response time for server i
- =Weight factor for connection count
- =Active connections to server i
L4 vs L7 Load Balancing
DfL4 Load Balancer
A Layer 4 (L4) load balancer operates at the transport layer (TCP/UDP). It makes routing decisions based on IP address and port numbers without inspecting the content of the request. L4 is faster (no payload inspection) but less flexible.
DfL7 Load Balancer
A Layer 7 (L7) load balancer operates at the application layer (HTTP/HTTPS). It can inspect the full request—URL, headers, cookies, body—and make routing decisions based on application-level information. L7 is more flexible but introduces more latency.
| Feature | L4 | L7 |
|---|---|---|
| OSI Layer | Transport (TCP/UDP) | Application (HTTP) |
| Routing basis | IP, port | URL, headers, cookies |
| Throughput | Higher (no payload inspection) | Lower (full inspection) |
| SSL termination | Can terminate | Typically terminates |
| Content-based routing | No | Yes (URL, header) |
| WebSocket support | Yes (pass-through) | Yes (with upgrade) |
| Use case | High-throughput, simple routing | Content routing, SSL offload |
Modern load balancers like Envoy, HAProxy, and AWS ALB support both L4 and L7 modes. Choose L7 when you need content-based routing (e.g., /api → service A, /static → CDN), WebSocket support, or request-level health checks.
Health Checks
Load balancers must detect and remove unhealthy servers.
Health Check Types
| Type | Description | Granularity |
|---|---|---|
| TCP | Can establish TCP connection | Port-level |
| HTTP | HTTP 200 response from health endpoint | Application-level |
| gRPC | gRPC health check protocol | Service-level |
| Custom | Application-specific check | Business-level |
Health Check Configuration
Health Check Parameters:
- Interval: 10 seconds (how often to check)
- Timeout: 5 seconds (max wait per check)
- Healthy threshold: 2 (consecutive successes to mark healthy)
- Unhealthy threshold: 3 (consecutive failures to mark unhealthy)
Health checks should verify the full dependency chain—not just "is the process running?" but "can the process serve requests?" A health endpoint should check database connectivity, cache availability, and disk space.
Global Server Load Balancing (GSLB)
For multi-region deployments, GSLB distributes traffic across data centers.
DfGSLB
Global Server Load Balancing distributes traffic across geographically dispersed data centers. It uses DNS-based or anycast-based routing to direct users to the optimal data center based on proximity, load, and health.
GSLB Strategies
| Strategy | How It Works |
|---|---|
| GeoDNS | Resolves to nearest data center by IP geolocation |
| Anycast | Multiple data centers share same IP; BGP routes to nearest |
| Latency-based | Routes to data center with lowest measured latency |
| Weighted | Distributes traffic by configurable weights |
Practice Exercises
-
Design: Design a load balancing strategy for a real-time gaming platform with 100M concurrent users. Consider: sticky sessions, geographic distribution, failover, and latency requirements.
-
Algorithm Selection: You have 5 servers with different capacities (2x, 1x, 1x, 1x, 0.5x). Which load balancing algorithm would you use? How would you configure it?
-
Architecture: Compare L4 and L7 load balancing for: (a) a TCP-based database proxy, (b) an API gateway with content-based routing, (c) a WebSocket chat service.
-
Troubleshooting: A load balancer is distributing traffic unevenly—server 1 gets 40% while others get 20% each. What could cause this? How would you diagnose and fix it?
Key Takeaways:
- Load balancing distributes traffic for high availability and optimal resource utilization
- Round-robin is simple but ignores load; least connections adapts but requires state tracking
- Consistent hashing minimizes redistribution when servers are added or removed
- L4 balancing is faster (transport layer); L7 balancing is smarter (application layer)
- Health checks must verify the full dependency chain, not just process availability
- GSLB distributes traffic across regions using DNS, anycast, or latency-based routing
What to Learn Next
-> Message Queues Kafka, RabbitMQ, event-driven architecture.
-> Microservices Service decomposition, discovery, and API gateways.
-> CAP Theorem Consistency models, availability, and partition tolerance.
-> Scalability Fundamentals Vertical vs horizontal scaling and capacity planning.
-> Databases SQL vs NoSQL, indexing, replication, and sharding.
-> API Design REST, GraphQL, gRPC, versioning, and rate limiting.