πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Rate Limiting

InfrastructureTraffic Management🟒 Free Lesson

Advertisement

Infrastructure

Rate Limiting

Rate limiting controls the rate of requests a client can send to a service. It protects against abuse, ensures fair resource usage, and maintains system stability under load.

  • Protection β€” Prevent overload and denial-of-service attacks
  • Fairness β€” Ensure equitable resource allocation
  • Stability β€” Maintain predictable performance under load

Rate limiting is the first line of defense for any production service.

Why Rate Limit?

Uncontrolled traffic can cascade failures through a system. Rate limiting provides a controlled degradation path.

DfRate Limiting

Rate limiting is a technique that controls the number of requests a client can make within a given time window. When the limit is exceeded, requests are rejected (HTTP 429), queued, or degraded. Rate limiting protects services from traffic spikes, ensures fair usage, and prevents cascading failures.

Types of Limits

Limit TypeDescriptionExample
Per-clientLimit per individual user/IP100 requests/minute per user
GlobalSystem-wide limit10,000 requests/second total
Per-endpointLimit per API endpoint10 writes/second per user
TieredDifferent limits per planFree: 100/min, Pro: 1000/min

Token Bucket

The most widely used rate limiting algorithm.

DfToken Bucket

The token bucket algorithm adds tokens at a fixed rate (r tokens/second) to a bucket with capacity C. Each request consumes one token. If the bucket is empty, the request is rejected. Bursts up to C are allowed, but sustained rate is limited to r.

Token Bucket

tokens(t)=min⁑(C,tokens(tlast)+rΓ—(tβˆ’tlast))tokens(t) = \min(C, tokens(t_{last}) + r \times (t - t_{last}))

Here,

  • CC=Bucket capacity (max burst size)
  • rr=Token refill rate (tokens per second)
  • tokens(tlast)tokens(t_{last})=Tokens remaining at last check
  • tβˆ’tlastt - t_{last}=Time elapsed since last check

Token Bucket Calculation

Bucket capacity C = 10 tokens, refill rate r = 2 tokens/second.

  • t=0: Bucket has 10 tokens
  • Request at t=0: 1 token consumed, 9 remaining
  • Request at t=0.5: 1 token consumed, 8 + 1 = 9 (refilled 1)
  • No requests until t=5: Bucket refills to 10 (capped)
  • 12 requests at t=5: 10 accepted, 2 rejected (bucket empty)
Token Bucket AlgorithmBucket (C=10)7/10 tokens used+r/secrequestAllowRejectHTTP 429

Sliding Window Log

Tracks exact timestamps of each request in a sorted set.

DfSliding Window Log

The sliding window log stores timestamps of each request in a sorted set (e.g., Redis sorted set). On each new request, expired entries are removed, and if the count is below the limit, the request is allowed. Provides exact counting but requires O(n) memory per client.

Sliding Window Check

allowed=(count(requestsΒ inΒ window)<limit)allowed = (count(requests \text{ in window}) < limit)

Here,

  • requestsrequests=Set of request timestamps in current window
  • limitlimit=Maximum allowed requests in window

The sliding window log provides exact rate limiting but is memory-intensive. For high-traffic systems, consider the sliding window counter (hybrid approach) which approximates using fixed windows with interpolation.

Sliding Window Counter

A hybrid approach combining fixed windows with interpolation.

DfSliding Window Counter

The sliding window counter divides time into fixed windows (e.g., 1 minute) and counts requests in the current and previous window. The count is interpolated based on how far into the current window we are. This provides smooth limiting with O(1) memory per window.

Sliding Window Counter

count=prev_countΓ—(1βˆ’elapsedwindow)+current_countcount = prev\_count \times (1 - \frac{elapsed}{window}) + current\_count

Here,

  • prevcountprev_count=Request count in previous window
  • currentcountcurrent_count=Request count in current window
  • elapsedelapsed=Time elapsed in current window
  • windowwindow=Window duration

Sliding Window Calculation

Window = 60 seconds, limit = 100 requests.

  • Previous window (0-60s): 80 requests
  • Current window (60-90s): 25 requests (30 seconds elapsed)

Count = 80 Γ— (1 - 30/60) + 25 = 80 Γ— 0.5 + 25 = 65

Since 65 < 100, the request is allowed.

Fixed Window

The simplest approach: count requests in fixed time periods.

DfFixed Window

The fixed window algorithm divides time into fixed periods (e.g., 1 minute). Each request increments a counter for the current window. When the window expires, the counter resets. Simple but prone to burst issues at window boundaries.

Fixed windows have a boundary problem: a client can make limit requests at the end of one window and limit at the start of the next, achieving 2Γ— the intended rate in a short period. Use sliding window or token bucket for production systems.

Distributed Rate Limiting

Rate limiting across multiple service instances.

DfDistributed Rate Limiting

Distributed rate limiting coordinates limits across multiple service instances. Each instance checks a centralized store (e.g., Redis) before allowing a request. This ensures global limits are enforced regardless of which instance handles the request.

Distributed Rate Limiting with RedisInstance 1Instance 2Instance NRedisINCR user:123EXPIRE 60sAtomic opsallow/denyHTTP 200 / 429

Distributed Challenges

ChallengeSolution
Race conditionsUse atomic operations (Redis INCR with EXPIRE)
Network latencyLocal rate limiting with periodic sync
Redis failureFallback to local limiting or fail open
Clock skewUse logical timestamps or server-side time

Use Redis Lua scripts for atomic distributed rate limiting. The script checks and increments the counter in a single atomic operation, preventing race conditions between instances.

Algorithm Comparison

AlgorithmAccuracyMemoryBurst HandlingComplexity
Token BucketHighO(1)Allows bursts up to CLow
Sliding Window LogExactO(n) per clientNo burstsMedium
Sliding Window CounterApproximateO(1)SmoothLow
Fixed WindowLowO(1)Boundary burstsVery Low
Leaky BucketHighO(1)Smooth outputLow

Practice Exercises

  1. Design: Design a rate limiting system for an API gateway handling 1M requests/second with per-user limits of 100 requests/minute. Include Redis architecture and failover.

  2. Analysis: Compare token bucket and sliding window counter for a video streaming service that allows 10-second bursts of high bitrate.

  3. Distributed: Implement distributed rate limiting using Redis with a Lua script. Handle the case where Redis is unavailable.

  4. Edge Case: A user has a rate limit of 10 requests/minute. They make 10 requests at 12:00:59 and 10 more at 12:01:01. How does each algorithm handle this?

Key Takeaways:

  • Rate limiting protects services from abuse and maintains stability
  • Token bucket allows bursts while enforcing sustained rate limits
  • Sliding window algorithms provide smoother limiting than fixed windows
  • Distributed rate limiting requires atomic operations and fault tolerance
  • Choose algorithms based on accuracy requirements, memory constraints, and burst tolerance
  • Always return HTTP 429 with Retry-After header when rate limiting

What to Learn Next

-> Proxy and Reverse Proxy Forward proxy, Nginx, HAProxy, and SSL termination.

-> API Design REST, GraphQL, gRPC, and API gateway patterns.

-> CDN Edge caching, DNS routing, and content distribution.

-> Load Balancing Distribution algorithms and L4 vs L7 load balancing.

-> Security Patterns Authentication, authorization, encryption, and mTLS.

-> Observability Logging, metrics, tracing, and monitoring.

⭐

Premium Content

Rate Limiting

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement