System Design Problems
Design a Rate Limiter
A rate limiter controls the rate of requests a client can send to an API. It protects services from abuse, ensures fair resource usage, and prevents cascading failures. Systems like Cloudflare, AWS API Gateway, and Kong implement distributed rate limiting at massive scale.
- Protection β Prevent DoS attacks and API abuse
- Fairness β Ensure equitable resource allocation across clients
- Throttling β Gracefully degrade service under overload
A rate limiter sits between the client and the server, deciding whether to allow or reject each request based on the client's recent request history.
Requirements
Functional Requirements
- Limit requests per client per time window
- Support different rate limits per API endpoint
- Support different limits per client tier (free, pro, enterprise)
- Return HTTP 429 (Too Many Requests) when limit exceeded
- Provide rate limit headers in responses
- Support distributed rate limiting across multiple servers
Non-Functional Requirements
- Latency: Rate limit check in < 1ms
- Accuracy: Allow slight over-limit (within 1%) rather than blocking valid requests
- Availability: Rate limiter failure should fail open (allow requests)
- Scalability: Handle millions of clients and thousands of API endpoints
Rate limiters can be deployed as a middleware in the API gateway, as a sidecar proxy, or as a separate service. The choice depends on the deployment architecture.
API Design
GET /api/v1/rate-limit/status
Response: {
"client_id": "user_123",
"limit": 1000,
"remaining": 742,
"reset_at": "2026-06-20T11:00:00Z",
"retry_after": 0
}
// Response headers
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1687267200
Retry-After: 30 // Seconds until next request allowed
Rate Limiting Algorithms
Token Bucket
DfToken Bucket Algorithm
A token bucket has a fixed capacity of tokens. Tokens are added at a constant rate. Each request consumes one token. If the bucket is empty, the request is rejected. Allows bursts up to the bucket size.
Token Bucket
Here,
- =Maximum tokens in the bucket
- =Token refill rate (tokens/second)
- =Time since last request
- =Tokens consumed (usually 1)
Token Bucket in Action
Bucket capacity: 10 tokens, refill rate: 2 tokens/second
- t=0: 10 tokens, request β 9 tokens (allowed)
- t=0.1: 9 tokens, request β 8 tokens (allowed)
- ... 8 rapid requests ...
- t=0.8: 2 tokens, request β 1 token (allowed)
- t=0.9: 1 token, request β 0 tokens (allowed)
- t=1.0: 0 tokens, request β REJECTED (429)
- t=1.5: 1 token, request β 0 tokens (allowed, refilled 1)
Sliding Window Counter
DfSliding Window Counter
The sliding window counter combines the current window count with a weighted portion of the previous window count to smooth boundary effects. This provides a more accurate count than fixed windows.
Sliding Window Count
Here,
- =Request count in previous window
- =Request count in current window
- =Fraction of previous window still in the sliding window
Sliding Window Calculation
Window size: 1 minute, current time: 10:45:30
- Previous window (10:44:00-10:45:00): 80 requests
- Current window (10:45:00-10:46:00): 30 requests
- Overlap ratio: 30/60 = 0.5
Sliding count = 80 Γ 0.5 + 30 = 70 requests
With a limit of 100, the request is allowed.
Algorithm Comparison
| Algorithm | Memory | Accuracy | Burst Handling | Complexity |
|---|---|---|---|---|
| Token Bucket | O(1) | Good | Allows bursts | Simple |
| Sliding Window Log | O(n) | Exact | No bursts | Complex |
| Fixed Window | O(1) | Boundary issues | Allows bursts | Simplest |
| Sliding Window Counter | O(1) | Good | Smooth | Moderate |
For most use cases, the sliding window counter offers the best balance of accuracy, memory efficiency, and simplicity. Use token bucket when burst handling is important.
Distributed Rate Limiting
Single vs Distributed
For single-server rate limiting, an in-memory counter suffices. For distributed systems, use Redis:
Redis Rate Limit Check (Lua Script)
Here,
- =Redis key: rate:{client}:{window}
- =Maximum requests per window
- =Window duration in seconds
Use Redis Lua scripts for atomic rate limit checks. The INCR + EXPIRE pattern is atomic within a Lua script, preventing race conditions between multiple API servers.
Race Condition Handling
In distributed systems, race conditions can allow more requests than the limit:
Race Condition Window
Here,
- =Number of API servers
- =Rate limit window
- =Request limit
Accept a small margin of over-limit (1-2%) rather than adding distributed locking. The simplicity gain outweighs the slight inaccuracy.
Practice Exercises
-
Design: How would you implement rate limiting that accounts for different API endpoint costs (e.g., a search query costs 5 units, a simple GET costs 1 unit)?
-
Distributed: If you have 10 API servers and a rate limit of 1000 requests/minute per client, what is the worst-case over-limit due to race conditions? How would you mitigate it?
-
Trade-offs: Compare token bucket and sliding window counter for an API that needs to support occasional bursts (10x normal rate for 5 seconds).
-
Edge Case: How would you handle rate limiting for a client that uses multiple IP addresses (e.g., behind a NAT)? Design a client identification strategy.
Key Takeaways:
- Token bucket is best for burst-friendly rate limiting; sliding window counter is best for smooth limiting
- Redis Lua scripts provide atomic rate limit checks in distributed systems
- Sliding window counter smooths boundary effects of fixed windows with minimal memory
- Accept slight over-limit (1-2%) rather than adding distributed locking complexity
- Rate limiters should fail open (allow requests) when the limiter itself fails
What to Learn Next
-> Rate Limiting Deep dive into rate limiting algorithms and distributed implementation.
-> Load Balancing Distributing requests across multiple API servers.
-> API Design REST conventions, error handling, and HTTP status codes.
-> Caching Strategies Redis caching patterns for distributed counters.
-> Proxy and Reverse Proxy Rate limiting at the proxy layer (nginx, Envoy).
-> Microservices Service mesh rate limiting and circuit breaking.