🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Scalability Fundamentals

FoundationsScaling Systems🟢 Free Lesson

Advertisement

System Design Foundations

Scalability Fundamentals

Scalability is the ability of a system to handle increased load by adding resources. This guide covers the mathematical foundations, strategies, and trade-offs for building systems that grow gracefully with demand.

  • Vertical Scaling — Upgrade the machine (bigger CPU, more RAM)
  • Horizontal Scaling — Add more machines to distribute load
  • Capacity Planning — Predict and prepare for future growth

Scale is not about handling today's traffic—it's about handling tomorrow's.

What Is Scalability?

Scalability is a system's ability to maintain or improve performance as resources (compute, memory, storage, network) are added.

DfScalability

Scalability is the capability of a system to handle a growing amount of work by adding resources. A system is scalable if doubling the resources results in at least double the throughput (ideally more). Scalability can be achieved vertically (scaling up) or horizontally (scaling out).

Vertical vs Horizontal Scaling

Vertical (Scale Up)SmallMediumLargeX-LargeHorizontal (Scale Out)Each node handles a portion of load

DfVertical Scaling (Scale Up)

Vertical scaling increases the capacity of a single machine by adding more CPU, RAM, or storage. It is simpler to implement (no code changes needed) but has a hard ceiling: the largest machine available.

DfHorizontal Scaling (Scale Out)

Horizontal scaling distributes load across multiple machines. It requires the application to be stateless or to externalize state. The theoretical limit is much higher than vertical scaling, but introduces distributed systems complexity.

DimensionVerticalHorizontal
ComplexityLow (no code changes)High (distributed systems)
CeilingLimited by largest machineVirtually unlimited
CostExponential growthLinear growth
Fault ToleranceSingle point of failureRedundant
LatencyNo inter-node communicationNetwork overhead

The Math of Scaling

Amdahl's Law and Scaling

When you add N machines, the theoretical maximum speedup is:

Amdahl's Law for N Machines

S(N)=1(1p)+pNS(N) = \frac{1}{(1 - p) + \frac{p}{N}}

Here,

  • S(N)S(N)=Speedup with N machines
  • pp=Fraction of workload that is parallelizable
  • 1p1 - p=Sequential fraction (fixed overhead)

Diminishing Returns of Horizontal Scaling

If 90% of your workload is parallelizable and you scale from 1 to 100 machines:

S(100) = 1 / (0.1 + 0.9/100) = 1 / 0.109 ≈ 9.17x

Going from 1 to 100 machines only yields ~9x speedup. The 10% sequential overhead dominates.

Gustafson's Law

Gustafson's Law offers a more optimistic view—when you add resources, you can solve larger problems:

Gustafson's Law

S(N)=N(1p)(N1)S(N) = N - (1 - p)(N - 1)

Here,

  • S(N)S(N)=Scaled speedup with N machines
  • pp=Parallel fraction of the workload
  • NN=Number of processors

Load Balancing

Load balancing distributes incoming requests across multiple servers to ensure no single server becomes a bottleneck.

DfLoad Balancing

Load balancing is the process of distributing network traffic across multiple servers to ensure high availability and reliability. A load balancer sits between clients and servers, routing each request to the optimal server based on a distribution algorithm.

Common Load Balancing Algorithms

AlgorithmHow It WorksBest For
Round RobinDistributes requests sequentiallyServers with equal capacity
Weighted Round RobinDistributes proportionally to weightServers with different capacities
Least ConnectionsRoutes to server with fewest active connectionsLong-lived connections
IP HashMaps client IP to a specific serverSession persistence needs
Least Response TimeRoutes to server with lowest latencyLatency-sensitive workloads
Consistent HashingMaps requests to servers using hash ringMinimizes redistribution on scaling

For stateless services, round-robin is often sufficient. For stateful services or when connection durations vary significantly, least-connections or consistent hashing are preferred.

Capacity Planning

Capacity planning ensures your system can handle future load without over-provisioning.

Capacity Planning Formula

Required Capacity=Peak QPS×(1+Safety Margin)\text{Required Capacity} = \text{Peak QPS} \times (1 + \text{Safety Margin})

Here,

  • PeakQPSPeak QPS=Maximum queries per second at peak
  • SafetyMarginSafety Margin=Buffer for unexpected spikes (typically 20-50%)

Step-by-Step Capacity Planning

  1. Forecast demand: Estimate future traffic based on growth rates
  2. Profile current system: Measure resource utilization at current load
  3. Identify bottlenecks: Which resource (CPU, memory, I/O, network) is the limiting factor?
  4. Calculate headroom: How much capacity is needed for the forecast?
  5. Plan scaling triggers: At what utilization threshold should you scale?

Capacity Planning for Growth

Current state:

  • 10M daily active users
  • 5 QPS average, 15 QPS peak
  • 80% CPU utilization at peak

Projected (next year, 3x growth):

  • 30M daily active users
  • 15 QPS average, 45 QPS peak
  • Required: 15 QPS average × 1.3 safety = 19.5 QPS sustained capacity

Current single server at 80% can handle ~6.25 QPS peak. Need at minimum 45/6.25 = 8 servers for peak.

Scaling Strategies

The Database Bottleneck

The most common scaling bottleneck is the database. Strategies include:

  • Read replicas: Offload read traffic to replica databases
  • Sharding: Partition data across multiple database instances
  • Caching: Store frequently accessed data in memory
  • Denormalization: Trade write complexity for read performance

The Three-Layer Scaling Model

ClientsRequestsLoad BalancerServer 1Server 2Server 3Server NDatabase (with read replicas)

Practice Exercises

  1. Estimation: A system currently handles 500 QPS with a single server at 70% CPU. If traffic grows to 2000 QPS, how many servers are needed? What if Amdahl's law applies with 5% sequential overhead?

  2. Design: Design a horizontally scalable web application architecture. Consider: how do you handle session state? How do you handle file uploads? How do you deploy new versions without downtime?

  3. Trade-offs: Compare vertical and horizontal scaling for a relational database. When is each approach appropriate? What are the cost implications at 10x, 100x, and 1000x scale?

  4. Analysis: Draw a decision tree for choosing a load balancing algorithm based on: server heterogeneity, connection duration variability, and session persistence requirements.

Key Takeaways:

  • Vertical scaling is simple but has a ceiling; horizontal scaling is complex but virtually unlimited
  • Amdahl's Law shows diminishing returns from adding more machines due to sequential overhead
  • Load balancing distributes traffic; the algorithm depends on server heterogeneity and workload characteristics
  • Capacity planning requires forecasting, profiling, and planning scaling triggers
  • The database is typically the first scaling bottleneck—use read replicas, sharding, and caching

What to Learn Next

-> Networking Fundamentals TCP/IP, HTTP, DNS, CDNs, and network latency.

-> API Design REST, GraphQL, gRPC, versioning, and rate limiting.

-> Databases SQL vs NoSQL, indexing, replication, and sharding.

-> Caching Strategies Redis, Memcached, cache invalidation, and write strategies.

-> Load Balancing Algorithms, health checks, and L4 vs L7.

-> CAP Theorem Consistency models, availability, and partition tolerance.

Premium Content

Scalability Fundamentals

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement