πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Design Search Autocomplete / Typeahead System

ML System DesignReal-time NLP and Search⭐ Premium

Advertisement

Google, Bing, Amazon, Elasticsearch

Design Search Autocomplete / Typeahead System

Building real-time query suggestions with sub-50ms latency for billions of users

Interview Question

"Design a search autocomplete system like Google Search that can provide real-time query suggestions as users type, handling billions of queries with sub-50ms latency, while personalizing suggestions based on user history and trending topics."

Difficulty: Hard | Frequently asked at Google, Microsoft/Bing, Amazon, Uber


1. Requirements Gathering

Functional Requirements

  1. Prefix Matching: Return suggestions matching typed prefix
  2. Real-time Updates: Suggestions update as user types each character
  3. Personalization: Suggestions based on user search history
  4. Trending Topics: Include currently trending queries
  5. Spell Correction: Handle typos and misspellings
  6. Multi-language: Support multiple languages and scripts
  7. Safe Search: Filter inappropriate suggestions

Non-Functional Requirements

  1. Latency: < 50ms for suggestion generation (critical for UX)
  2. Throughput: 100,000+ queries per second, billions daily
  3. Freshness: Trending topics update within minutes
  4. Relevance: Top suggestion should be clicked > 40% of the time
  5. Availability: 99.99% uptime
  6. Scale: 8.5B+ daily searches, 1.8B+ users
  7. Privacy: User history must be private

ℹ️

Scale Perspective: Google processes over 8.5 billion searches daily. Their autocomplete system provides suggestions after just 3-4 characters, with responses in under 50ms. The system must handle massive concurrent users while providing personalized, relevant suggestions.


2. High-Level Architecture Overview

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           CLIENT LAYER                                      β”‚
β”‚  Chrome Browser β”‚ Mobile App β”‚ Google Search β”‚ Voice Assistant              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         EDGE / CDN LAYER                                    β”‚
β”‚  Request Routing β”‚ Response Caching β”‚ Rate Limiting β”‚ DDoS Protection       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β–Ό               β–Ό               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SUGGESTION SERVICE    β”‚ β”‚ PERSONALIZATIONβ”‚ β”‚ TRENDING SERVICE     β”‚
β”‚  (Trie + ML Ranking)   β”‚ β”‚ SERVICE        β”‚ β”‚ (Real-time)          β”‚
β”‚  (< 20ms)              β”‚ β”‚ (< 10ms)       β”‚ β”‚ (< 15ms)             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        DATA LAYER                                           β”‚
β”‚  Trie Index β”‚ Query Logs β”‚ User History β”‚ Trending Data β”‚ Feature Store    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ’‘

Key Insight: Autocomplete requires extremely fast prefix matching. The trie data structure is ideal for this, but must be distributed across servers for scale. ML ranking adds relevance but must be optimized for latency.


3. Data Pipeline Design

3.1 Query Data Model

from dataclasses import dataclass
from typing import List, Dict, Optional
from datetime import datetime

@dataclass
class QuerySuggestion:
    query: str
    frequency: int
    last_updated: datetime
    category: Optional[str]
    language: str
    is_trending: bool
    trending_score: Optional[float]
    personalization_score: Optional[float]

@dataclass
class QueryLog:
    query: str
    user_id: str
    timestamp: datetime
    position_clicked: Optional[int]
    session_id: str
    device_type: str
    country: str

3.2 Trie-Based Index

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False
        self.frequency = 0
        self.query = None
        self.last_updated = None

class DistributedTrie:
    def __init__(self, num_shards=1000):
        self.num_shards = num_shards
        self.shards = [TrieShard(i) for i in range(num_shards)]
    
    def get_shard(self, prefix):
        shard_id = hash(prefix) % self.num_shards
        return self.shards[shard_id]
    
    def insert(self, query, frequency=1):
        shard = self.get_shard(query[:3])
        shard.insert(query, frequency)
    
    def search(self, prefix, k=10):
        shard = self.get_shard(prefix[:3])
        return shard.search(prefix, k)

class TrieShard:
    def __init__(self, shard_id):
        self.shard_id = shard_id
        self.root = TrieNode()
    
    def insert(self, query, frequency=1):
        node = self.root
        for char in query:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True
        node.frequency += frequency
        node.query = query
        node.last_updated = datetime.now()
    
    def search(self, prefix, k=10):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return []
            node = node.children[char]
        
        # Collect all queries with this prefix
        results = []
        self._collect_queries(node, results, k)
        
        # Sort by frequency
        results.sort(key=lambda x: x.frequency, reverse=True)
        return results[:k]
    
    def _collect_queries(self, node, results, k):
        if len(results) >= k:
            return
        if node.is_end:
            results.append((node.query, node.frequency))
        for child in node.children.values():
            self._collect_queries(child, results, k)

3.3 Real-time Query Processing

class QueryProcessor:
    def __init__(self):
        self.trie = DistributedTrie()
        self.user_history = RedisClient()
        self.trending_service = TrendingService()
    
    async def get_suggestions(self, prefix, user_id=None, k=10):
        # Get trie-based suggestions
        trie_suggestions = self.trie.search(prefix, k * 2)
        
        # Get personalized suggestions
        if user_id:
            personalized = await self.get_personalized_suggestions(prefix, user_id)
        else:
            personalized = []
        
        # Get trending suggestions
        trending = await self.trending_service.get_trending(prefix)
        
        # Merge and rank
        all_suggestions = self.merge_suggestions(
            trie_suggestions, personalized, trending
        )
        
        # Apply ML ranking
        ranked = await self.rank_suggestions(all_suggestions, user_id)
        
        return ranked[:k]
    
    def merge_suggestions(self, trie_suggestions, personalized, trending):
        merged = {}
        
        # Add trie suggestions
        for query, freq in trie_suggestions:
            merged[query] = {'query': query, 'frequency': freq, 'source': 'trie'}
        
        # Add personalized suggestions (higher weight)
        for query, score in personalized:
            if query in merged:
                merged[query]['personalization_score'] = score
            else:
                merged[query] = {'query': query, 'frequency': 0, 'source': 'personalized'}
        
        # Add trending suggestions (boost score)
        for query, trend_score in trending:
            if query in merged:
                merged[query]['trending_score'] = trend_score
            else:
                merged[query] = {'query': query, 'frequency': 0, 'source': 'trending'}
        
        return list(merged.values())

⚠️

Critical Design Considerations:

  1. Memory efficiency: Tries can be memory-intensive - use compressed tries
  2. Distribution: Shard by prefix for even distribution
  3. Consistency: Handle concurrent updates gracefully
  4. Privacy: Don't store user search history in plain text

4. Model Selection and Training

4.1 ML-Based Ranking

class AutocompleteRanker:
    def __init__(self):
        self.model = self.build_model()
    
    def build_model(self):
        model = tf.keras.Sequential([
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.BatchNormalization(),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(64, activation='relu'),
            tf.keras.layers.Dense(1, activation='sigmoid')
        ])
        return model
    
    def extract_features(self, query, user_context):
        return {
            'query_length': len(query),
            'query_frequency': self.get_query_frequency(query),
            'recency_score': self.get_recency_score(query),
            'personalization_score': self.get_personalization_score(query, user_context),
            'trending_score': self.get_trending_score(query),
            'category_match': self.get_category_match(query, user_context),
            'language_match': self.get_language_match(query, user_context)
        }
    
    async def rank_suggestions(self, suggestions, user_context):
        features = []
        for suggestion in suggestions:
            feat = self.extract_features(suggestion['query'], user_context)
            features.append(feat)
        
        # Predict click probability
        X = np.array([list(f.values()) for f in features])
        scores = self.model.predict(X)
        
        # Combine with original frequency
        for i, suggestion in enumerate(suggestions):
            suggestion['ml_score'] = scores[i][0]
            suggestion['final_score'] = (
                0.4 * suggestion.get('frequency', 0) / 1000000 +
                0.3 * suggestion.get('personalization_score', 0) +
                0.2 * suggestion.get('trending_score', 0) +
                0.1 * suggestion['ml_score']
            )
        
        # Sort by final score
        suggestions.sort(key=lambda x: x['final_score'], reverse=True)
        return suggestions

4.2 Personalization Model

class PersonalizationModel:
    def __init__(self):
        self.user_embeddings = None
        self.query_embeddings = None
    
    async def get_personalized_suggestions(self, prefix, user_id):
        # Get user's search history
        history = await self.get_user_history(user_id)
        
        # Get user embedding
        user_emb = await self.get_user_embedding(user_id)
        
        # Find similar queries from history
        similar_queries = []
        for query in history:
            if query.startswith(prefix):
                query_emb = await self.get_query_embedding(query)
                similarity = np.dot(user_emb, query_emb)
                similar_queries.append((query, similarity))
        
        # Sort by similarity
        similar_queries.sort(key=lambda x: x[1], reverse=True)
        return similar_queries[:10]

ℹ️

Personalization Best Practices:

  1. Use embedding-based similarity for better matching
  2. Balance personalization with diversity
  3. Handle cold start for new users
  4. Respect user privacy preferences

5. Serving Architecture

5.1 Real-time Serving Pipeline

Architecture Diagram
User Input β†’ Load Balancer β†’ Suggestion Service β†’ Response (< 50ms)
                                β”‚
                                β”œβ”€β”€ Trie Lookup (< 10ms)
                                β”œβ”€β”€ ML Ranking (< 15ms)
                                β”œβ”€β”€ Personalization (< 10ms)
                                └── Response Assembly (< 5ms)

5.2 Caching Strategy

class AutocompleteCache:
    def __init__(self):
        self.l1_cache = LRUCache(maxsize=100000)  # In-memory
        self.l2_cache = RedisCluster()  # Distributed
    
    async def get_suggestions(self, prefix, user_id=None):
        cache_key = f"{prefix}:{user_id or 'global'}"
        
        # Check L1 cache
        result = self.l1_cache.get(cache_key)
        if result:
            return result
        
        # Check L2 cache
        result = await self.l2_cache.get(cache_key)
        if result:
            self.l1_cache.set(cache_key, result, ttl=60)
            return result
        
        # Compute fresh results
        result = await self.compute_suggestions(prefix, user_id)
        
        # Cache results
        self.l1_cache.set(cache_key, result, ttl=60)
        await self.l2_cache.set(cache_key, result, ttl=300)
        
        return result

πŸ’‘

Caching Tips:

  1. Cache prefix-based results aggressively
  2. Use shorter TTL for personalized results
  3. Cache trending topics separately
  4. Implement cache warming for popular prefixes

6. Monitoring and Observability

6.1 Key Metrics

class AutocompleteMetrics:
    QUALITY_METRICS = ['click_through_rate', 'top_suggestion_ctr', 'suggestion_relevance']
    OPERATIONAL_METRICS = ['latency_p50', 'latency_p95', 'latency_p99', 'qps', 'error_rate']
    BUSINESS_METRICS = ['search_completion_rate', 'query_refinement_rate', 'user_satisfaction']

7. Scale Considerations and Trade-offs

7.1 Horizontal Scaling

Architecture Diagram
Query Volume: Shard trie by prefix hash
ML Models: Horizontal scaling with load balancing
Feature Store: Redis cluster with read replicas
Cache: Multi-level caching with L1 (in-memory) and L2 (distributed)

7.2 Cost vs Performance Trade-offs

DimensionOption A (Cost Optimized)Option B (Performance Optimized)
Data StructureHash-based prefix matchingTrie with compression
RankingFrequency-based onlyML-based ranking
PersonalizationMinimal personalizationDeep personalization
CachingAggressive cachingMinimal caching

8. Advanced Topics

8.1 Spell Correction

class SpellCorrector:
    def __init__(self):
        self.edit_distance_cache = {}
    
    def get_suggestions(self, query, max_distance=2):
        suggestions = []
        for known_query in self.known_queries:
            distance = self.edit_distance(query, known_query)
            if distance <= max_distance:
                suggestions.append((known_query, distance))
        
        suggestions.sort(key=lambda x: x[1])
        return suggestions[:5]

8.2 Trending Detection

class TrendingDetector:
    def __init__(self):
        self.query_counts = {}
        self.baseline_counts = {}
    
    async def detect_trending(self, time_window_minutes=5):
        trending = []
        current_counts = await self.get_query_counts(time_window_minutes)
        
        for query, count in current_counts.items():
            baseline = self.baseline_counts.get(query, 0)
            if baseline > 0:
                trend_score = count / baseline
                if trend_score > 2.0:  # 2x increase
                    trending.append((query, trend_score))
        
        trending.sort(key=lambda x: x[1], reverse=True)
        return trending[:100]

9. Implementation Roadmap

Phase 1: Basic Trie (Weeks 1-3)

  • Implement distributed trie
  • Basic prefix matching
  • Frequency-based ranking

Phase 2: ML Ranking (Weeks 4-6)

  • Feature engineering pipeline
  • Train ranking model
  • A/B testing framework

Phase 3: Advanced Features (Weeks 7-10)

  • Personalization
  • Trending detection
  • Spell correction

Phase 4: Optimization (Weeks 11-12)

  • Latency optimization
  • Caching strategy
  • Cost optimization

10. Summary and Key Takeaways

Architecture Recap

  1. Distributed trie: For fast prefix matching at scale
  2. ML ranking: For relevance optimization
  3. Personalization: For user-specific suggestions
  4. Multi-level caching: For low latency

Key Metrics

  • Latency: < 50ms for suggestion generation
  • CTR: Top suggestion should be clicked > 40%
  • Freshness: Trending topics update within minutes

Common Interview Mistakes

  1. Not discussing distributed trie design
  2. Ignoring personalization requirements
  3. Forgetting about trending topics
  4. Not considering privacy requirements

ℹ️

Final Interview Tip: Start with the trie data structure, then discuss how to distribute it. Show understanding of both algorithmic efficiency and ML-based relevance. Emphasize the latency requirements and caching strategy.


Further Reading

  • "Trie-based Autocomplete Systems" (Google Research)
  • "Real-time Query Suggestion" (Microsoft Research)
  • "Personalized Search Autocomplete" (Amazon)
  • "Distributed Data Structures" (ACM Computing Surveys)

Advertisement