🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Load Balancing

InfrastructureTraffic Distribution🟢 Free Lesson

Advertisement

System Design — Infrastructure

Load Balancing

Load balancing distributes incoming traffic across multiple servers to ensure high availability, reliability, and optimal resource utilization. It's one of the most critical components in any scalable system.

  • Algorithms — Round-robin, least connections, consistent hashing
  • L4 vs L7 — Transport vs application layer balancing
  • Health Checks — Detecting and removing unhealthy servers

If you have more than one server, you need a load balancer.

What Is Load Balancing?

DfLoad Balancing

Load balancing is the process of distributing network traffic across multiple backend servers (targets) to ensure no single server bears a disproportionate share of the load. A load balancer acts as a reverse proxy, routing requests to healthy servers based on a configurable distribution algorithm.

Load Balancer Placement

ClientsDNS LBGlobal LB (GSLB)L7 LB (US-East)L7 LB (EU-West)Srv 1Srv 2Srv 3Srv 4Multi-tier load balancing: DNS → Global LB → Regional L7 LB → Servers

Load Balancing Algorithms

Round Robin

Distributes requests sequentially across servers.

Round Robin

serveri=servers[imodN]\text{server}_i = \text{servers}[i \\mod N]

Here,

  • ii=Request counter (incrementing)
  • NN=Number of servers

Pros: Simple, even distribution for uniform workloads Cons: Ignores server capacity and current load

Weighted Round Robin

Assigns weights proportional to server capacity.

Weighted Round Robin

weighti=capacityij=1Ncapacityj\text{weight}_i = \frac{\text{capacity}_i}{\sum_{j=1}^{N} \text{capacity}_j}

Here,

  • capacityicapacity_i=Capacity of server i (CPU, memory, or custom metric)
  • NN=Total number of servers

Least Connections

Routes to the server with fewest active connections.

Least Connections

server=argminiactive_connectionsi\text{server} = \\arg\min_{i} \\{\text{active\_connections}_i\\}

Here,

  • activeconnectionsiactive_connections_i=Number of active connections to server i

Pros: Adapts to varying request durations Cons: Requires tracking connection state, doesn't account for connection weight

Consistent Hashing

Maps servers and requests to a hash ring, minimizing redistribution when servers are added or removed.

S1 (0°)S2 (90°)S3 (180°)S4 (270°)Each request walks clockwise to the first server node

DfConsistent Hashing

Consistent hashing maps both servers and keys to positions on a ring (0 to 2^32-1). A key is assigned to the first server encountered walking clockwise. When a server is added/removed, only keys between it and its neighbors are redistributed, minimizing data movement.

Pros: Minimal redistribution on scaling, works well with caches Cons: May be uneven without virtual nodes

Least Response Time

Routes to the server with the lowest average response time and fewest active connections.

Least Response Time

server=argminiavg_response_timei+α×active_connectionsi\text{server} = \\arg\min_{i} \\{\text{avg\_response\_time}_i + \alpha \times \text{active\_connections}_i\\}

Here,

  • avgresponsetimeiavg_response_time_i=Average response time for server i
  • α\alpha=Weight factor for connection count
  • activeconnectionsiactive_connections_i=Active connections to server i

L4 vs L7 Load Balancing

DfL4 Load Balancer

A Layer 4 (L4) load balancer operates at the transport layer (TCP/UDP). It makes routing decisions based on IP address and port numbers without inspecting the content of the request. L4 is faster (no payload inspection) but less flexible.

DfL7 Load Balancer

A Layer 7 (L7) load balancer operates at the application layer (HTTP/HTTPS). It can inspect the full request—URL, headers, cookies, body—and make routing decisions based on application-level information. L7 is more flexible but introduces more latency.

FeatureL4L7
OSI LayerTransport (TCP/UDP)Application (HTTP)
Routing basisIP, portURL, headers, cookies
ThroughputHigher (no payload inspection)Lower (full inspection)
SSL terminationCan terminateTypically terminates
Content-based routingNoYes (URL, header)
WebSocket supportYes (pass-through)Yes (with upgrade)
Use caseHigh-throughput, simple routingContent routing, SSL offload

Modern load balancers like Envoy, HAProxy, and AWS ALB support both L4 and L7 modes. Choose L7 when you need content-based routing (e.g., /api → service A, /static → CDN), WebSocket support, or request-level health checks.

Health Checks

Load balancers must detect and remove unhealthy servers.

Health Check Types

TypeDescriptionGranularity
TCPCan establish TCP connectionPort-level
HTTPHTTP 200 response from health endpointApplication-level
gRPCgRPC health check protocolService-level
CustomApplication-specific checkBusiness-level

Health Check Configuration

Architecture Diagram
Health Check Parameters:
- Interval: 10 seconds (how often to check)
- Timeout: 5 seconds (max wait per check)
- Healthy threshold: 2 (consecutive successes to mark healthy)
- Unhealthy threshold: 3 (consecutive failures to mark unhealthy)

Health checks should verify the full dependency chain—not just "is the process running?" but "can the process serve requests?" A health endpoint should check database connectivity, cache availability, and disk space.

Global Server Load Balancing (GSLB)

For multi-region deployments, GSLB distributes traffic across data centers.

DfGSLB

Global Server Load Balancing distributes traffic across geographically dispersed data centers. It uses DNS-based or anycast-based routing to direct users to the optimal data center based on proximity, load, and health.

GSLB Strategies

StrategyHow It Works
GeoDNSResolves to nearest data center by IP geolocation
AnycastMultiple data centers share same IP; BGP routes to nearest
Latency-basedRoutes to data center with lowest measured latency
WeightedDistributes traffic by configurable weights

Practice Exercises

  1. Design: Design a load balancing strategy for a real-time gaming platform with 100M concurrent users. Consider: sticky sessions, geographic distribution, failover, and latency requirements.

  2. Algorithm Selection: You have 5 servers with different capacities (2x, 1x, 1x, 1x, 0.5x). Which load balancing algorithm would you use? How would you configure it?

  3. Architecture: Compare L4 and L7 load balancing for: (a) a TCP-based database proxy, (b) an API gateway with content-based routing, (c) a WebSocket chat service.

  4. Troubleshooting: A load balancer is distributing traffic unevenly—server 1 gets 40% while others get 20% each. What could cause this? How would you diagnose and fix it?

Key Takeaways:

  • Load balancing distributes traffic for high availability and optimal resource utilization
  • Round-robin is simple but ignores load; least connections adapts but requires state tracking
  • Consistent hashing minimizes redistribution when servers are added or removed
  • L4 balancing is faster (transport layer); L7 balancing is smarter (application layer)
  • Health checks must verify the full dependency chain, not just process availability
  • GSLB distributes traffic across regions using DNS, anycast, or latency-based routing

What to Learn Next

-> Message Queues Kafka, RabbitMQ, event-driven architecture.

-> Microservices Service decomposition, discovery, and API gateways.

-> CAP Theorem Consistency models, availability, and partition tolerance.

-> Scalability Fundamentals Vertical vs horizontal scaling and capacity planning.

-> Databases SQL vs NoSQL, indexing, replication, and sharding.

-> API Design REST, GraphQL, gRPC, versioning, and rate limiting.

Premium Content

Load Balancing

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement