System Design Problems

Design a Rate Limiter

A rate limiter controls the rate of requests a client can send to an API. It protects services from abuse, ensures fair resource usage, and prevents cascading failures. Systems like Cloudflare, AWS API Gateway, and Kong implement distributed rate limiting at massive scale.

Protection — Prevent DoS attacks and API abuse
Fairness — Ensure equitable resource allocation across clients
Throttling — Gracefully degrade service under overload

A rate limiter sits between the client and the server, deciding whether to allow or reject each request based on the client's recent request history.

Requirements

Functional Requirements

Limit requests per client per time window
Support different rate limits per API endpoint
Support different limits per client tier (free, pro, enterprise)
Return HTTP 429 (Too Many Requests) when limit exceeded
Provide rate limit headers in responses
Support distributed rate limiting across multiple servers

Non-Functional Requirements

Latency: Rate limit check in < 1ms
Accuracy: Allow slight over-limit (within 1%) rather than blocking valid requests
Availability: Rate limiter failure should fail open (allow requests)
Scalability: Handle millions of clients and thousands of API endpoints

Rate limiters can be deployed as a middleware in the API gateway, as a sidecar proxy, or as a separate service. The choice depends on the deployment architecture.

API Design

Architecture Diagram

GET /api/v1/rate-limit/status
Response: {
  "client_id": "user_123",
  "limit": 1000,
  "remaining": 742,
  "reset_at": "2026-06-20T11:00:00Z",
  "retry_after": 0
}

// Response headers
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1687267200
Retry-After: 30  // Seconds until next request allowed

Rate Limiting Algorithms

Token Bucket

DfToken Bucket Algorithm

A token bucket has a fixed capacity of tokens. Tokens are added at a constant rate. Each request consumes one token. If the bucket is empty, the request is rejected. Allows bursts up to the bucket size.

Token Bucket

tokens = \min(capacity, tokens + rate \times \Delta t) - cost

Here,

$capacity$ =Maximum tokens in the bucket
$rate$ =Token refill rate (tokens/second)
$Delta t$ =Time since last request
$cost$ =Tokens consumed (usually 1)

Token Bucket in Action

Bucket capacity: 10 tokens, refill rate: 2 tokens/second

t=0: 10 tokens, request → 9 tokens (allowed)
t=0.1: 9 tokens, request → 8 tokens (allowed)
... 8 rapid requests ...
t=0.8: 2 tokens, request → 1 token (allowed)
t=0.9: 1 token, request → 0 tokens (allowed)
t=1.0: 0 tokens, request → REJECTED (429)
t=1.5: 1 token, request → 0 tokens (allowed, refilled 1)

Sliding Window Counter

DfSliding Window Counter

The sliding window counter combines the current window count with a weighted portion of the previous window count to smooth boundary effects. This provides a more accurate count than fixed windows.

Sliding Window Count

count = prev\_count \times overlap\_ratio + current\_count

Here,

$prev_count$ =Request count in previous window
$current_count$ =Request count in current window
$overlap_ratio$ =Fraction of previous window still in the sliding window

Sliding Window Calculation

Window size: 1 minute, current time: 10:45:30

Previous window (10:44:00-10:45:00): 80 requests
Current window (10:45:00-10:46:00): 30 requests
Overlap ratio: 30/60 = 0.5

Sliding count = 80 × 0.5 + 30 = 70 requests

With a limit of 100, the request is allowed.

Algorithm Comparison

Algorithm	Memory	Accuracy	Burst Handling	Complexity
Token Bucket	O(1)	Good	Allows bursts	Simple
Sliding Window Log	O(n)	Exact	No bursts	Complex
Fixed Window	O(1)	Boundary issues	Allows bursts	Simplest
Sliding Window Counter	O(1)	Good	Smooth	Moderate

For most use cases, the sliding window counter offers the best balance of accuracy, memory efficiency, and simplicity. Use token bucket when burst handling is important.

Distributed Rate Limiting

Single vs Distributed

For single-server rate limiting, an in-memory counter suffices. For distributed systems, use Redis:

Redis Rate Limit Check (Lua Script)

current = INCR(key);\nif current == 1:\n EXPIRE(key, window);\nif current > limit:\n REJECT;\nelse:\n ALLOW

Here,

$key$ =Redis key: rate:{client}:{window}
$limit$ =Maximum requests per window
$window$ =Window duration in seconds

Use Redis Lua scripts for atomic rate limit checks. The INCR + EXPIRE pattern is atomic within a Lua script, preventing race conditions between multiple API servers.

Race Condition Handling

In distributed systems, race conditions can allow more requests than the limit:

Race Condition Window

over\_limit = \frac{N_{servers} - 1}{window\_size} \times limit

Here,

$N_{servers}$ =Number of API servers
$window_size$ =Rate limit window
$limit$ =Request limit

Accept a small margin of over-limit (1-2%) rather than adding distributed locking. The simplicity gain outweighs the slight inaccuracy.

Practice Exercises

Design: How would you implement rate limiting that accounts for different API endpoint costs (e.g., a search query costs 5 units, a simple GET costs 1 unit)?
Distributed: If you have 10 API servers and a rate limit of 1000 requests/minute per client, what is the worst-case over-limit due to race conditions? How would you mitigate it?
Trade-offs: Compare token bucket and sliding window counter for an API that needs to support occasional bursts (10x normal rate for 5 seconds).
Edge Case: How would you handle rate limiting for a client that uses multiple IP addresses (e.g., behind a NAT)? Design a client identification strategy.

Key Takeaways:

Token bucket is best for burst-friendly rate limiting; sliding window counter is best for smooth limiting
Redis Lua scripts provide atomic rate limit checks in distributed systems
Sliding window counter smooths boundary effects of fixed windows with minimal memory
Accept slight over-limit (1-2%) rather than adding distributed locking complexity
Rate limiters should fail open (allow requests) when the limiter itself fails

What to Learn Next

-> Rate Limiting Deep dive into rate limiting algorithms and distributed implementation.

-> Load Balancing Distributing requests across multiple API servers.

-> API Design REST conventions, error handling, and HTTP status codes.

-> Caching Strategies Redis caching patterns for distributed counters.

-> Proxy and Reverse Proxy Rate limiting at the proxy layer (nginx, Envoy).

-> Microservices Service mesh rate limiting and circuit breaking.

Design a Rate Limiter

Design a Rate Limiter

Requirements

Functional Requirements

Non-Functional Requirements

API Design

Rate Limiting Algorithms

Token Bucket

DfToken Bucket Algorithm

Token Bucket

Token Bucket in Action

Sliding Window Counter

DfSliding Window Counter

Sliding Window Count

Sliding Window Calculation

Algorithm Comparison

Distributed Rate Limiting

Single vs Distributed

Redis Rate Limit Check (Lua Script)

Race Condition Handling

Race Condition Window

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?