πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Design Search Autocomplete

System Design ProblemsTypeahead Suggestions🟒 Free Lesson

Advertisement

System Design Problems

Design Search Autocomplete

Search autocomplete (typeahead) suggests search queries in real-time as users type. Google processes billions of autocomplete queries daily with < 100ms latency, using a trie-based data structure ranked by query popularity.

  • Real-time Suggestions β€” Results returned within 50ms of each keystroke
  • Popularity-ranked β€” Most popular queries appear first
  • Context-aware β€” Suggestions personalized by user location and history

The challenge is balancing freshness (trending queries), relevance (popular queries), and latency (sub-50ms response times) across billions of possible prefixes.

Requirements

Functional Requirements

  • As user types, show top 5-10 search suggestions
  • Suggestions ranked by popularity (frequency)
  • Support for multiple languages
  • Suggestions update in near real-time as trends change
  • Personalized suggestions based on user history

Non-Functional Requirements

  • Latency: Response in < 50ms (p99)
  • Throughput: 100K QPS per region
  • Freshness: Trending queries visible within 1 hour
  • Availability: 99.99% uptime (autocomplete is critical for UX)

Autocomplete must be faster than the user's typing speed. At 200ms between keystrokes, the response must arrive within 50ms to feel instantaneous.

Back-of-the-Envelope Estimation

Autocomplete Capacity Estimation

  • Average query length: 25 characters
  • Average keystrokes per query: 15
  • Queries per day: 5B
  • Keystroke events: 5B Γ— 15 = 75B/day
  • QPS_write = 75B / 86400 β‰ˆ 870K QPS (aggregated per prefix)
  • QPS_read = 100K QPS per region
  • Total unique prefixes: ~1B (with deduplication)

API Design

Architecture Diagram
GET /api/v1/autocomplete?q=how+to+ma&limit=5
Response: {
  "suggestions": [
    "how to make pasta",
    "how to make money",
    "how to make friends",
    "how to make candles",
    "how to make soap"
  ],
  "latency_ms": 12
}

High-Level Architecture

ClientCDNAutocomplete ServiceTrie LookupRanking EnginePersonalizationTrie Store(In-memory)Analytics DB(Query Counts)Search Autocomplete Architecture

Detailed Design

Trie Data Structure

DfTrie (Prefix Tree)

A trie is a tree data structure where each node represents a character. Paths from root to leaf nodes represent complete strings. Tries enable O(L) prefix lookup where L is the prefix length.

roothcoiaowwtt*m*o*"how""how""hit""cat""cam""coo"Trie with frequency counts at leaf nodes

Trie Lookup Complexity

O(L)Β whereΒ L=prefixΒ lengthO(L) \text{ where } L = \text{prefix length}

Here,

  • LL=Length of the search prefix

Aggregation and Ranking

Aggregate query frequencies and rank by popularity:

DfQuery Frequency Aggregation

Track the frequency of each search query using a counter. The autocomplete suggestions are the top-K most frequent completions for a given prefix.

Ranking Score

score(q)=Ξ±β‹…frequency(q)+Ξ²β‹…recency(q)+Ξ³β‹…personalization(q,user)score(q) = \alpha \cdot frequency(q) + \beta \cdot recency(q) + \gamma \cdot personalization(q, user)

Here,

  • frequency(q)frequency(q)=Total number of times query q was searched
  • recency(q)recency(q)=Time-weighted frequency (recent searches weighted more)
  • personalization(q,user)personalization(q, user)=User's historical search similarity

Frequency-based Ranking

For prefix "how to ma":

  • "how to make pasta": 500K searches/week
  • "how to make money": 300K searches/week
  • "how to make friends": 100K searches/week
  • "how to make candles": 50K searches/week
  • "how to make soap": 20K searches/week

Top 5 returned in ranked order.

Two-Layer Architecture

Use a two-layer approach for efficiency:

  1. Offline Aggregation Layer: Collects search logs, aggregates frequencies, builds trie
  2. Online Serving Layer: Serves trie from memory, returns top-K suggestions

The trie is rebuilt periodically (e.g., every 1 hour) from aggregated frequency data. During rebuild, use a snapshot of the old trie to serve requests without downtime.

CDN Caching

Popular prefixes can be cached at CDN edge locations:

  • Cache prefix "how to" at all edge locations
  • Cache prefix "best rest" at regional edges
  • Cache invalidation: TTL-based (1 hour) or explicit purge

CDN Cache Hit Rate

hit_rate=cached_queriestotal_querieshit\_rate = \frac{cached\_queries}{total\_queries}

Here,

  • cachedqueriescached_queries=Queries served from CDN cache
  • totalqueriestotal_queries=Total autocomplete queries

CDN Efficiency

With 100K QPS and 80% cache hit ratio:

  • CDN serves: 80K QPS
  • Origin serves: 20K QPS

This reduces origin load by 80%, critical for handling peak traffic.

Practice Exercises

  1. Design: How would you handle multi-language autocomplete? What changes are needed for languages with different character sets (Chinese, Arabic)?

  2. Algorithm: Implement a trie that supports top-K retrieval for a prefix. What is the time and space complexity?

  3. Scale: If the system needs to handle 100K QPS with < 10ms latency, estimate the memory needed for a trie with 1 billion unique prefixes at 50 bytes per node.

  4. Freshness: Design a system to update trending autocomplete suggestions in real-time without rebuilding the entire trie.

Key Takeaways:

  • Trie data structure enables O(L) prefix lookup with L = prefix length
  • Offline aggregation collects frequencies; online serving returns top-K from in-memory trie
  • CDN caching absorbs 80%+ of traffic for popular prefixes
  • Ranking combines frequency, recency, and personalization signals
  • Trie rebuild is periodic (hourly) with snapshot-based serving during rebuild

What to Learn Next

-> Design Search Engine Full-text indexing, inverted indices, and relevance ranking.

-> CDNs Edge caching and global content distribution.

-> Design Typesense Autocomplete Typo-tolerant fuzzy search with instant results.

-> Database Indexing B-tree, LSM-tree, and index structures for fast lookups.

-> Caching Strategies CDN caching, application caching, and cache invalidation.

-> Rate Limiting Protecting autocomplete from abuse and excessive load.

⭐

Premium Content

Design Search Autocomplete

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement