System Design Problems
Design Search Autocomplete
Search autocomplete (typeahead) suggests search queries in real-time as users type. Google processes billions of autocomplete queries daily with < 100ms latency, using a trie-based data structure ranked by query popularity.
- Real-time Suggestions β Results returned within 50ms of each keystroke
- Popularity-ranked β Most popular queries appear first
- Context-aware β Suggestions personalized by user location and history
The challenge is balancing freshness (trending queries), relevance (popular queries), and latency (sub-50ms response times) across billions of possible prefixes.
Requirements
Functional Requirements
- As user types, show top 5-10 search suggestions
- Suggestions ranked by popularity (frequency)
- Support for multiple languages
- Suggestions update in near real-time as trends change
- Personalized suggestions based on user history
Non-Functional Requirements
- Latency: Response in < 50ms (p99)
- Throughput: 100K QPS per region
- Freshness: Trending queries visible within 1 hour
- Availability: 99.99% uptime (autocomplete is critical for UX)
Autocomplete must be faster than the user's typing speed. At 200ms between keystrokes, the response must arrive within 50ms to feel instantaneous.
Back-of-the-Envelope Estimation
Autocomplete Capacity Estimation
- Average query length: 25 characters
- Average keystrokes per query: 15
- Queries per day: 5B
- Keystroke events: 5B Γ 15 = 75B/day
- QPS_write = 75B / 86400 β 870K QPS (aggregated per prefix)
- QPS_read = 100K QPS per region
- Total unique prefixes: ~1B (with deduplication)
API Design
GET /api/v1/autocomplete?q=how+to+ma&limit=5
Response: {
"suggestions": [
"how to make pasta",
"how to make money",
"how to make friends",
"how to make candles",
"how to make soap"
],
"latency_ms": 12
}
High-Level Architecture
Detailed Design
Trie Data Structure
DfTrie (Prefix Tree)
A trie is a tree data structure where each node represents a character. Paths from root to leaf nodes represent complete strings. Tries enable O(L) prefix lookup where L is the prefix length.
Trie Lookup Complexity
Here,
- =Length of the search prefix
Aggregation and Ranking
Aggregate query frequencies and rank by popularity:
DfQuery Frequency Aggregation
Track the frequency of each search query using a counter. The autocomplete suggestions are the top-K most frequent completions for a given prefix.
Ranking Score
Here,
- =Total number of times query q was searched
- =Time-weighted frequency (recent searches weighted more)
- =User's historical search similarity
Frequency-based Ranking
For prefix "how to ma":
- "how to make pasta": 500K searches/week
- "how to make money": 300K searches/week
- "how to make friends": 100K searches/week
- "how to make candles": 50K searches/week
- "how to make soap": 20K searches/week
Top 5 returned in ranked order.
Two-Layer Architecture
Use a two-layer approach for efficiency:
- Offline Aggregation Layer: Collects search logs, aggregates frequencies, builds trie
- Online Serving Layer: Serves trie from memory, returns top-K suggestions
The trie is rebuilt periodically (e.g., every 1 hour) from aggregated frequency data. During rebuild, use a snapshot of the old trie to serve requests without downtime.
CDN Caching
Popular prefixes can be cached at CDN edge locations:
- Cache prefix "how to" at all edge locations
- Cache prefix "best rest" at regional edges
- Cache invalidation: TTL-based (1 hour) or explicit purge
CDN Cache Hit Rate
Here,
- =Queries served from CDN cache
- =Total autocomplete queries
CDN Efficiency
With 100K QPS and 80% cache hit ratio:
- CDN serves: 80K QPS
- Origin serves: 20K QPS
This reduces origin load by 80%, critical for handling peak traffic.
Practice Exercises
-
Design: How would you handle multi-language autocomplete? What changes are needed for languages with different character sets (Chinese, Arabic)?
-
Algorithm: Implement a trie that supports top-K retrieval for a prefix. What is the time and space complexity?
-
Scale: If the system needs to handle 100K QPS with < 10ms latency, estimate the memory needed for a trie with 1 billion unique prefixes at 50 bytes per node.
-
Freshness: Design a system to update trending autocomplete suggestions in real-time without rebuilding the entire trie.
Key Takeaways:
- Trie data structure enables O(L) prefix lookup with L = prefix length
- Offline aggregation collects frequencies; online serving returns top-K from in-memory trie
- CDN caching absorbs 80%+ of traffic for popular prefixes
- Ranking combines frequency, recency, and personalization signals
- Trie rebuild is periodic (hourly) with snapshot-based serving during rebuild
What to Learn Next
-> Design Search Engine Full-text indexing, inverted indices, and relevance ranking.
-> CDNs Edge caching and global content distribution.
-> Design Typesense Autocomplete Typo-tolerant fuzzy search with instant results.
-> Database Indexing B-tree, LSM-tree, and index structures for fast lookups.
-> Caching Strategies CDN caching, application caching, and cache invalidation.
-> Rate Limiting Protecting autocomplete from abuse and excessive load.