System Design Problems

Design Search Autocomplete

Search autocomplete (typeahead) suggests search queries in real-time as users type. Google processes billions of autocomplete queries daily with < 100ms latency, using a trie-based data structure ranked by query popularity.

Real-time Suggestions — Results returned within 50ms of each keystroke
Popularity-ranked — Most popular queries appear first
Context-aware — Suggestions personalized by user location and history

The challenge is balancing freshness (trending queries), relevance (popular queries), and latency (sub-50ms response times) across billions of possible prefixes.

Requirements

Functional Requirements

As user types, show top 5-10 search suggestions
Suggestions ranked by popularity (frequency)
Support for multiple languages
Suggestions update in near real-time as trends change
Personalized suggestions based on user history

Non-Functional Requirements

Latency: Response in < 50ms (p99)
Throughput: 100K QPS per region
Freshness: Trending queries visible within 1 hour
Availability: 99.99% uptime (autocomplete is critical for UX)

Autocomplete must be faster than the user's typing speed. At 200ms between keystrokes, the response must arrive within 50ms to feel instantaneous.

Back-of-the-Envelope Estimation

Autocomplete Capacity Estimation

Average query length: 25 characters
Average keystrokes per query: 15
Queries per day: 5B
Keystroke events: 5B × 15 = 75B/day
QPS_write = 75B / 86400 ≈ 870K QPS (aggregated per prefix)
QPS_read = 100K QPS per region
Total unique prefixes: ~1B (with deduplication)

API Design

Architecture Diagram

GET /api/v1/autocomplete?q=how+to+ma&limit=5
Response: {
  "suggestions": [
    "how to make pasta",
    "how to make money",
    "how to make friends",
    "how to make candles",
    "how to make soap"
  ],
  "latency_ms": 12
}

High-Level Architecture

Detailed Design

Trie Data Structure

DfTrie (Prefix Tree)

A trie is a tree data structure where each node represents a character. Paths from root to leaf nodes represent complete strings. Tries enable O(L) prefix lookup where L is the prefix length.

Trie Lookup Complexity

O(L) \text{ where } L = \text{prefix length}

Here,

$L$ =Length of the search prefix

Aggregation and Ranking

Aggregate query frequencies and rank by popularity:

DfQuery Frequency Aggregation

Track the frequency of each search query using a counter. The autocomplete suggestions are the top-K most frequent completions for a given prefix.

Ranking Score

score(q) = \alpha \cdot frequency(q) + \beta \cdot recency(q) + \gamma \cdot personalization(q, user)

Here,

$frequency(q)$ =Total number of times query q was searched
$recency(q)$ =Time-weighted frequency (recent searches weighted more)
$personalization(q, user)$ =User's historical search similarity

Frequency-based Ranking

For prefix "how to ma":

"how to make pasta": 500K searches/week
"how to make money": 300K searches/week
"how to make friends": 100K searches/week
"how to make candles": 50K searches/week
"how to make soap": 20K searches/week

Top 5 returned in ranked order.

Two-Layer Architecture

Use a two-layer approach for efficiency:

Offline Aggregation Layer: Collects search logs, aggregates frequencies, builds trie
Online Serving Layer: Serves trie from memory, returns top-K suggestions

The trie is rebuilt periodically (e.g., every 1 hour) from aggregated frequency data. During rebuild, use a snapshot of the old trie to serve requests without downtime.

CDN Caching

Popular prefixes can be cached at CDN edge locations:

Cache prefix "how to" at all edge locations
Cache prefix "best rest" at regional edges
Cache invalidation: TTL-based (1 hour) or explicit purge

CDN Cache Hit Rate

hit\_rate = \frac{cached\_queries}{total\_queries}

Here,

$cached_queries$ =Queries served from CDN cache
$total_queries$ =Total autocomplete queries

CDN Efficiency

With 100K QPS and 80% cache hit ratio:

CDN serves: 80K QPS
Origin serves: 20K QPS

This reduces origin load by 80%, critical for handling peak traffic.

Practice Exercises

Design: How would you handle multi-language autocomplete? What changes are needed for languages with different character sets (Chinese, Arabic)?
Algorithm: Implement a trie that supports top-K retrieval for a prefix. What is the time and space complexity?
Scale: If the system needs to handle 100K QPS with < 10ms latency, estimate the memory needed for a trie with 1 billion unique prefixes at 50 bytes per node.
Freshness: Design a system to update trending autocomplete suggestions in real-time without rebuilding the entire trie.

Key Takeaways:

Trie data structure enables O(L) prefix lookup with L = prefix length
Offline aggregation collects frequencies; online serving returns top-K from in-memory trie
CDN caching absorbs 80%+ of traffic for popular prefixes
Ranking combines frequency, recency, and personalization signals
Trie rebuild is periodic (hourly) with snapshot-based serving during rebuild

What to Learn Next

-> Design Search Engine Full-text indexing, inverted indices, and relevance ranking.

-> CDNs Edge caching and global content distribution.

-> Design Typesense Autocomplete Typo-tolerant fuzzy search with instant results.

-> Database Indexing B-tree, LSM-tree, and index structures for fast lookups.

-> Caching Strategies CDN caching, application caching, and cache invalidation.

-> Rate Limiting Protecting autocomplete from abuse and excessive load.

Design Search Autocomplete

Design Search Autocomplete

Requirements

Functional Requirements

Non-Functional Requirements

Back-of-the-Envelope Estimation

Autocomplete Capacity Estimation

API Design

High-Level Architecture

Detailed Design

Trie Data Structure

DfTrie (Prefix Tree)

Trie Lookup Complexity

Aggregation and Ranking

DfQuery Frequency Aggregation

Ranking Score

Frequency-based Ranking

Two-Layer Architecture

CDN Caching

CDN Cache Hit Rate

CDN Efficiency

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?