Google, Bing, Baidu, Elasticsearch

Design Google Search Ranking System

Building intelligent search ranking for billions of web pages with sub-second latency

Interview Question

"Design a search ranking system like Google that can rank billions of web pages for millions of queries per second, with sub-200ms latency and continuously improving relevance through machine learning."

Difficulty: Hard | Frequently asked at Google, Microsoft/Bing, Amazon, Elasticsearch

1. Requirements Gathering

Functional Requirements

Query Understanding: Parse, correct, and understand user search intent
Document Retrieval: Efficiently retrieve candidate documents from billions of pages
Relevance Ranking: ML-powered ranking that considers hundreds of signals
Result Diversification: Multiple result types (web, images, videos, news, knowledge panel)
Personalization: Results customized based on user history and preferences
Auto-complete and Suggestions: Real-time query suggestions as user types
Freshness: Recent content should rank appropriately for time-sensitive queries

Non-Functional Requirements

Latency: < 200ms for full page results (including network)
Throughput: 100,000+ queries per second globally
Availability: 99.99% uptime (5 minutes downtime per year)
Scale: Index 100+ billion web pages, handle 8.5 billion searches/day
Freshness: New content indexed within hours, breaking news within minutes
Consistency: Same query should return consistent results within a session

ℹ️

Scale Perspective: Google processes over 8.5 billion searches per day. The search index contains hundreds of billions of web pages. The ranking system evaluates over 200 signals per query in under 100ms. At peak, Google serves over 100,000 queries per second.

2. High-Level Architecture Overview

The search ranking system follows a multi-stage architecture: Query Understanding → Document Retrieval → Initial Ranking → Re-ranking → Result Assembly.

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                           CLIENT LAYER                                      │
│  Chrome Browser │ Mobile App │ Google Search │ Voice Assistant │ IoT        │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         GLOBAL LOAD BALANCER                                │
│  Geographic Routing │ Rate Limiting │ DDoS Protection │ SSL Termination     │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
┌────────────────────────┐ ┌───────────────┐ ┌──────────────────────┐
│  QUERY UNDERSTANDING   │ │ SPELL CHECK & │ │ AUTO-COMPLETE        │
│  SERVICE               │ │ CORRECTION    │ │ SERVICE              │
└────────────────────────┘ └───────────────┘ └──────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        DOCUMENT RETRIEVAL                                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
│  │ Web Index    │  │ Image Index  │  │ Video Index  │  │ News Index   │   │
│  │ (Inverted)   │  │ (Visual)     │  │ (Metadata)   │  │ (Temporal)   │   │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        RANKING PIPELINE                                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
│  │ Initial      │  │ ML Ranking   │  │ Re-ranking   │  │ Diversity    │   │
│  │ Retrieval    │  │ (GBDT+NN)    │  │ (Context)    │  │ & Filtering  │   │
│  │ (~1000 docs) │  │ (~100 docs)  │  │ (~20 docs)   │  │ (10 docs)    │   │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        RESULT ASSEMBLY                                       │
│  Knowledge Panel │ Featured Snippets │ People Also Ask │ Related Searches  │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        DATA & ML INFRASTRUCTURE                             │
│  Feature Store │ Click Logs │ Crawl Data │ Knowledge Graph │ Training Pipeline │
└─────────────────────────────────────────────────────────────────────────────┘

💡

Key Insight: The funnel architecture is essential for performance. You can't run expensive ML models on billions of documents. Instead, you use efficient retrieval to get ~1000 candidates, then apply increasingly sophisticated (and expensive) models to narrow down to ~100, then ~20, and finally ~10 results.

3. Data Pipeline Design

3.1 Data Sources

Web Crawl Data:

Page content (HTML, text, structured data)
Links (anchor text, link graph)
Metadata (title, description, schema.org markup)
Page quality signals (spam score, authority)

User Interaction Data:

Search queries (with timestamps and location)
Click-through data (what was clicked, what was skipped)
Dwell time (how long users stayed on results)
Reformulation patterns (query changes after initial search)
Bounce rate (quick return to search results)

Query Logs:

Query frequency and trends
Query-document click pairs
Session context (sequence of queries)
Personalization signals

# Example: Search event schema
{
    "event_id": "uuid-v4",
    "session_id": "session_12345",
    "user_id": "user_67890",  # Optional, may be anonymous
    "query": "machine learning tutorial",
    "query_tokens": ["machine", "learning", "tutorial"],
    "corrected_query": None,
    "timestamp": "2026-06-29T14:30:00Z",
    "location": {
        "country": "US",
        "region": "California",
        "city": "San Francisco"
    },
    "device": "desktop",
    "results_shown": [
        {
            "position": 1,
            "url": "https://example.com/tutorial1",
            "title": "ML Tutorial",
            "clicked": true,
            "dwell_time_seconds": 180
        }
    ],
    "session_queries": ["machine learning", "machine learning tutorial"]
}

3.2 Web Crawl and Index Pipeline

Architecture Diagram

Seed URLs → Crawl Scheduler → Web Crawler → Content Processing → Index Builder
     │            │              │              │                  │
     │            │              │              ▼                  ▼
     │            │              │         Content Quality    Inverted Index
     │            │              │         & Spam Detection   + Forward Index
     │            │              │              │                  │
     │            │              │              ▼                  ▼
     │            │              │         Link Analysis       Partitioned
     │            │              │         (PageRank)          Index Shards
     │            │              │              │                  │
     │            │              │              └──────────────────┘
     │            │              │                       │
     │            │              │                       ▼
     │            │              │              Index Serving Layer
     │            │              │              (Distributed Retrieval)

3.3 Inverted Index Design

# Inverted index structure for efficient document retrieval
class InvertedIndex:
    def __init__(self):
        self.index = {}  # term -> [(doc_id, tf_idf_score, positions)]
        
    def add_document(self, doc_id, tokens):
        """Add document to inverted index"""
        term_freq = {}
        for token in tokens:
            term_freq[token] = term_freq.get(token, 0) + 1
        
        for term, freq in term_freq.items():
            if term not in self.index:
                self.index[term] = []
            
            # Calculate TF-IDF score
            tf = freq / len(tokens)
            idf = math.log(self.total_docs / (1 + len(self.index[term])))
            score = tf * idf
            
            self.index[term].append((doc_id, score, []))
    
    def search(self, query_tokens, k=1000):
        """Retrieve top-k documents matching query"""
        # Get posting lists for query terms
        posting_lists = []
        for token in query_tokens:
            if token in self.index:
                posting_lists.append(self.index[token])
        
        # Intersect posting lists (simplified - real implementation uses skip lists)
        candidates = {}
        for posting_list in posting_lists:
            for doc_id, score, _ in posting_list:
                if doc_id in candidates:
                    candidates[doc_id] += score
                else:
                    candidates[doc_id] = score
        
        # Sort by score and return top-k
        sorted_candidates = sorted(
            candidates.items(), 
            key=lambda x: x[1], 
            reverse=True
        )
        return sorted_candidates[:k]

3.4 Feature Engineering Pipeline

class SearchFeatureExtractor:
    """Extract features for search ranking model"""
    
    def extract_query_features(self, query):
        return {
            'query_length': len(query.split()),
            'query_term_count': len(query.split()),
            'has_quotes': '"' in query,
            'has_site_operator': 'site:' in query,
            'is_navigational': self.is_navigational_query(query),
            'is_informational': self.is_informational_query(query),
            'is_transactional': self.is_transactional_query(query),
            'query_embedding': self.encode_query(query)
        }
    
    def extract_document_features(self, doc):
        return {
            # Content features
            'page_rank': doc.page_rank,
            'page_authority': doc.authority_score,
            'content_length': len(doc.text),
            'freshness_days': (datetime.now() - doc.last_modified).days,
            
            # Content quality features
            'grammar_score': doc.grammar_score,
            'readability_score': doc.readability_score,
            'spam_score': doc.spam_score,
            
            # Link features
            'inbound_links': doc.inbound_link_count,
            'outbound_links': doc.outbound_link_count,
            'domain_authority': doc.domain_authority,
            
            # Technical features
            'page_load_time': doc.page_load_time_ms,
            'mobile_friendly': doc.is_mobile_friendly,
            'https': doc.is_https,
            
            # Structured data
            'has_schema_markup': doc.has_schema_markup,
            'has_open_graph': doc.has_open_graph
        }
    
    def extract_query_document_features(self, query, doc):
        return {
            # Text matching features
            'bm25_score': self.compute_bm25(query, doc),
            'title_match_score': self.title_match_score(query, doc.title),
            'url_match_score': self.url_match_score(query, doc.url),
            'anchor_text_match': self.anchor_text_match(query, doc),
            
            # Semantic similarity
            'query_doc_similarity': self.semantic_similarity(query, doc),
            
            # Click-through features
            'historical_ctr': self.get_ctr(query, doc),
            'position_bias': self.get_position_bias(query, doc),
            
            # Freshness features
            'freshness_boost': self.freshness_score(query, doc)
        }

⚠️

Feature Engineering Pitfall: Don't just throw all features into a model. Feature selection and understanding feature interactions is critical. For example, BM25 score and page rank interact non-linearly - a high BM25 score from a low-authority page means something different than from a high-authority page.

4. Model Selection and Training Approach

4.1 Learning-to-Rank Models

Search ranking is a classic learning-to-rank problem. The main approaches are:

Pointwise Approach:

class PointwiseRanker:
    """Predict relevance score for each query-document pair independently"""
    
    def __init__(self):
        self.model = GradientBoostingClassifier()
        
    def train(self, X, y):
        """
        X: feature matrix [n_samples, n_features]
        y: relevance labels [0, 1, 2, 3] (not relevant, partial, good, perfect)
        """
        self.model.fit(X, y)
        
    def predict(self, X):
        return self.model.predict_proba(X)[:, 1]

Pairwise Approach (LambdaMART):

import lightgbm as lgb

class LambdaMARTRanker:
    """Rank documents by optimizing pairwise preferences"""
    
    def __init__(self):
        self.model = lgb.LGBMRanker(
            objective='lambdarank',
            metric='ndcg',
            eval_at=[5, 10, 20],
            num_leaves=64,
            learning_rate=0.05,
            n_estimators=1000
        )
        
    def train(self, X_train, y_train, group_train):
        """
        group_train: number of documents per query
        """
        self.model.fit(
            X_train, y_train,
            group=group_train,
            eval_set=[(X_val, y_val)],
            eval_group=[group_val],
            callbacks=[lgb.early_stopping(50)]
        )

Listwise Approach (ListNet):

class ListNetRanker:
    """Optimize list-level ranking metrics directly"""
    
    def __init__(self):
        self.model = self.build_model()
        
    def build_model(self):
        model = tf.keras.Sequential([
            tf.keras.layers.Dense(256, activation='relu'),
            tf.keras.layers.BatchNormalization(),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.Dense(1, activation='sigmoid')
        ])
        return model
    
    def listnet_loss(self, y_true, y_pred):
        """ListNet cross-entropy loss"""
        # Convert to probability distribution
        y_true_prob = tf.nn.softmax(y_true)
        y_pred_log_prob = tf.math.log(tf.nn.softmax(y_pred))
        
        # KL divergence
        return tf.reduce_sum(
            y_true_prob * (tf.math.log(y_true_prob) - y_pred_log_prob),
            axis=-1
        )

4.2 Two-Stage Ranking Architecture

class TwoStageRanker:
    """Two-stage ranking: initial scoring + re-ranking"""
    
    def __init__(self):
        # Stage 1: Fast initial scoring (GBDT)
        self.initial_scorer = lgb.LGBMRanker(
            objective='lambdarank',
            num_leaves=32,
            n_estimators=500
        )
        
        # Stage 2: Precise re-ranking (Neural Network)
        self.reranker = self.build_reranker()
        
    def build_reranker(self):
        """Deep cross network for re-ranking"""
        input_layer = tf.keras.Input(shape=(n_features,))
        
        # Cross network for feature interactions
        cross1 = CrossLayer()(input_layer)
        cross2 = CrossLayer()(cross1)
        cross3 = CrossLayer()(cross2)
        
        # Deep network for high-order interactions
        deep1 = Dense(512, activation='relu')(input_layer)
        deep2 = Dense(256, activation='relu')(deep1)
        deep3 = Dense(128, activation='relu')(deep2)
        
        # Combine cross and deep
        combined = Concatenate()([cross3, deep3])
        output = Dense(1, activation='sigmoid')(combined)
        
        return tf.keras.Model(inputs=input_layer, outputs=output)
    
    def rank(self, query, candidates):
        # Stage 1: Initial scoring
        initial_scores = self.initial_scorer.predict(candidates.features)
        
        # Take top-100 for re-ranking
        top_candidates = candidates[np.argsort(initial_scores)[-100:]]
        
        # Stage 2: Re-ranking with more features
        rerank_scores = self.reranker.predict(top_candidates.rerank_features)
        
        # Return top-10
        final_ranking = np.argsort(rerank_scores)[-10:][::-1]
        return top_candidates[final_ranking]

4.3 Multi-Signal Optimization

class MultiSignalRanker:
    """Optimize for multiple relevance signals"""
    
    def __init__(self):
        self.click_model = ClickPredictionModel()
        self.dwell_time_model = DwellTimeModel()
        self.satisfaction_model = SatisfactionModel()
        
    def compute_composite_score(self, query, doc):
        # Predict individual signals
        p_click = self.click_model.predict(query, doc)
        expected_dwell = self.dwell_time_model.predict(query, doc)
        p_satisfied = self.satisfaction_model.predict(query, doc)
        
        # Weighted combination (weights learned via grid search)
        composite_score = (
            0.4 * p_click +
            0.3 * min(expected_dwell / 60, 1.0) +  # Normalize to 0-1
            0.3 * p_satisfied
        )
        
        return composite_score

ℹ️

Production Ranking System: In production, Google uses a combination of:

GBDT for fast initial scoring (handles sparse features well)
Deep neural networks for re-ranking (captures complex interactions)
Multiple objective optimization (relevance, freshness, authority)
Position-aware training (accounts for position bias in click data)

5. Serving Architecture

5.1 Real-time Search Serving

Architecture Diagram

User Query → Load Balancer → Query Understanding Service
                                    │
                                    ▼
                              ┌─────────────────┐
                              │ Spell Check &    │
                              │ Query Correction │
                              │ (< 20ms)         │
                              └─────────────────┘
                                    │
                                    ▼
                              ┌─────────────────┐
                              │ Document         │
                              │ Retrieval        │
                              │ (Inverted Index) │
                              │ (< 30ms)         │
                              └─────────────────┘
                                    │
                                    ▼
                              ┌─────────────────┐
                              │ Initial Ranking  │
                              │ (GBDT Model)     │
                              │ (< 20ms)         │
                              └─────────────────┘
                                    │
                                    ▼
                              ┌─────────────────┐
                              │ Re-ranking       │
                              │ (Neural Network) │
                              │ (< 50ms)         │
                              └─────────────────┘
                                    │
                                    ▼
                              ┌─────────────────┐
                              │ Result Assembly  │
                              │ & Snippets       │
                              │ (< 30ms)         │
                              └─────────────────┘
                                    │
                                    ▼
                              Response (< 150ms)

5.2 Distributed Index Serving

# Distributed index serving architecture
class DistributedSearchIndex:
    def __init__(self, n_shards=1000):
        self.n_shards = n_shards
        self.shards = [SearchShard(i) for i in range(n_shards)]
        
    def search(self, query, k=100):
        """Distributed search across shards"""
        # Route query to relevant shards based on document partitioning
        relevant_shards = self.route_query(query)
        
        # Parallel search across shards
        with ThreadPoolExecutor(max_workers=len(relevant_shards)) as executor:
            futures = []
            for shard in relevant_shards:
                future = executor.submit(shard.search, query, k * 2)
                futures.append(future)
            
            # Merge results from all shards
            all_results = []
            for future in futures:
                all_results.extend(future.result())
        
        # Global re-ranking and top-k selection
        global_ranking = self.global_rerank(query, all_results)
        return global_ranking[:k]
    
    def route_query(self, query):
        """Route query to relevant shards"""
        # Simple approach: hash-based routing
        query_hash = hash(query) % self.n_shards
        
        # For search, we need to query multiple shards
        # because any document could match any query
        return self.shards  # Query all shards in practice
        
class SearchShard:
    def __init__(self, shard_id):
        self.shard_id = shard_id
        self.index = InvertedIndex()
        self.cache = LRUCache(maxsize=10000)
        
    def search(self, query, k):
        # Check cache first
        cache_key = f"{query}:{k}"
        if cache_key in self.cache:
            return self.cache[cache_key]
        
        # Search local index
        results = self.index.search(query, k)
        
        # Cache results
        self.cache[cache_key] = results
        return results

5.3 Caching Strategy

class MultiLevelCache:
    """Multi-level caching for search results"""
    
    def __init__(self):
        # L1: In-memory cache (per server)
        self.l1_cache = LRUCache(maxsize=10000)  # ~100MB
        
        # L2: Distributed cache (Redis cluster)
        self.l2_cache = RedisCluster(
            startup_nodes=[
                {"host": "redis-1", "port": 6379},
                {"host": "redis-2", "port": 6379}
            ]
        )
        
        # L3: Persistent cache (for long-tail queries)
        self.l3_cache = DiskCache(directory='/cache/search')
        
    def get(self, query, k):
        """Get cached results with fallback"""
        cache_key = self.generate_key(query, k)
        
        # Try L1
        result = self.l1_cache.get(cache_key)
        if result:
            return result, "l1"
        
        # Try L2
        result = self.l2_cache.get(cache_key)
        if result:
            self.l1_cache.set(cache_key, result, ttl=300)
            return result, "l2"
        
        # Try L3
        result = self.l3_cache.get(cache_key)
        if result:
            self.l1_cache.set(cache_key, result, ttl=300)
            self.l2_cache.set(cache_key, result, ttl=3600)
            return result, "l3"
        
        return None, "miss"
    
    def set(self, query, k, results, ttl=3600):
        """Set results in all cache levels"""
        cache_key = self.generate_key(query, k)
        
        self.l1_cache.set(cache_key, results, ttl=300)
        self.l2_cache.set(cache_key, results, ttl=ttl)
        self.l3_cache.set(cache_key, results, ttl=ttl * 24)  # 24x longer in L3

💡

Caching Insights: Google caches ~30% of search results. The most common queries (weather, time, calculator) are almost always cached. For personalized queries, use user-specific cache keys. Consider cache warming for trending topics.

6. Monitoring and Observability

6.1 Search Quality Metrics

class SearchQualityMetrics:
    """Comprehensive search quality metrics"""
    
    # Relevance metrics
    RELEVANCE_METRICS = [
        'nDCG@5',           # Normalized Discounted Cumulative Gain
        'nDCG@10',
        'nDCG@20',
        'MRR',              # Mean Reciprocal Rank
        'MAP',              # Mean Average Precision
        'precision@1',      # Precision at k
        'recall@10'
    ]
    
    # User satisfaction metrics
    SATISFACTION_METRICS = [
        'click_through_rate',
        'abandonment_rate',   # Sessions with no clicks
        'reformulation_rate', # Queries that get rephrased
        'session_success_rate'
    ]
    
    # Business metrics
    BUSINESS_METRICS = [
        'ad_revenue_per_search',
        'organic_click_rate',
        'search_market_share'
    ]
    
    def compute_ndcg(self, relevance_scores, k=10):
        """Compute nDCG@k"""
        # Ideal ranking (sorted by relevance)
        ideal_scores = sorted(relevance_scores, reverse=True)[:k]
        
        # Actual ranking (as returned by system)
        actual_scores = relevance_scores[:k]
        
        # DCG
        dcg = sum(
            (2**score - 1) / math.log2(i + 2) 
            for i, score in enumerate(actual_scores)
        )
        
        # Ideal DCG
        idcg = sum(
            (2**score - 1) / math.log2(i + 2)
            for i, score in enumerate(ideal_scores)
        )
        
        return dcg / idcg if idcg > 0 else 0

6.2 Online Experimentation

class SearchExperimentFramework:
    """A/B testing for search ranking"""
    
    def __init__(self):
        self.experiment_config = self.load_experiment_config()
        
    def get_search_results(self, query, user_id):
        """Get results with experiment assignment"""
        # Determine experiment variant
        variant = self.assign_variant(user_id, query)
        
        # Get results based on variant
        if variant == 'control':
            results = self.control_ranker.rank(query)
        elif variant == 'treatment_a':
            results = self.new_model_a.rank(query)
        elif variant == 'treatment_b':
            results = self.new_model_b.rank(query)
        
        # Log experiment impression
        self.log_impression(query, user_id, variant, results)
        
        return results
    
    def analyze_experiment(self, experiment_id):
        """Analyze experiment results"""
        data = self.get_experiment_data(experiment_id)
        
        results = {
            'control': self.compute_metrics(data[data.variant == 'control']),
            'treatment_a': self.compute_metrics(data[data.variant == 'treatment_a']),
            'treatment_b': self.compute_metrics(data[data.variant == 'treatment_b'])
        }
        
        # Statistical significance testing
        significance = {}
        for variant in ['treatment_a', 'treatment_b']:
            p_value = self.mannwhitney_u_test(
                results['control']['nDCG@10'],
                results[variant]['nDCG@10']
            )
            significance[variant] = p_value < 0.05
        
        return results, significance

⚠️

Experiment Pitfalls: Search experiments have unique challenges:

Network effects: Users may share results across variants
Position bias: Users tend to click top results regardless of relevance
Learning effects: Users adapt to new ranking over time
Query distribution: Long-tail queries may have insufficient data

7. Scale Considerations and Trade-offs

7.1 Global Distribution

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                        GLOBAL SEARCH INFRASTRUCTURE                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
│  │ North America│  │ Europe       │  │ Asia Pacific │  │ South America│   │
│  │ Data Center  │  │ Data Center  │  │ Data Center  │  │ Data Center  │   │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘   │
│         │                │                │                │                │
│         └────────────────┼────────────────┼────────────────┘                │
│                          │                │                                 │
│                          ▼                ▼                                 │
│                    ┌─────────────────────────────┐                         │
│                    │    Global Load Balancer      │                         │
│                    │    (Anycast DNS + GSLB)      │                         │
│                    └─────────────────────────────┘                         │
│                          │                                                  │
│                          ▼                                                  │
│                    ┌─────────────────────────────┐                         │
│                    │    Cross-Region Replication  │                         │
│                    │    (Index + Query Logs)      │                         │
│                    └─────────────────────────────┘                         │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

7.2 Cost vs Performance Trade-offs

Dimension	Option A (Cost Optimized)	Option B (Performance Optimized)
Index Storage	Compressed inverted index on disk	Full index in memory
Ranking Model	GBDT on CPU	Deep neural network on GPU
Caching	Aggressive caching, stale results	Minimal caching, fresh results
Freshness	Daily index updates	Real-time index updates
Personalization	Minimal personalization	Deep personalization

7.3 Latency Budget Breakdown

class LatencyBudget:
    """Track and optimize latency budget"""
    
    BUDGET = {
        'network_roundtrip': 30,      # ms
        'query_parsing': 10,          # ms
        'spell_correction': 15,       # ms
        'document_retrieval': 40,     # ms
        'initial_ranking': 25,        # ms
        're_ranking': 50,             # ms
        'result_assembly': 20,        # ms
        'serialization': 10,          # ms
        'total': 200                  # ms
    }
    
    def measure_latency_breakdown(self, query):
        """Measure actual latency breakdown"""
        timings = {}
        
        # Query parsing
        start = time.time()
        parsed_query = self.parse_query(query)
        timings['query_parsing'] = (time.time() - start) * 1000
        
        # Spell correction
        start = time.time()
        corrected = self.spell_correct(parsed_query)
        timings['spell_correction'] = (time.time() - start) * 1000
        
        # Document retrieval
        start = time.time()
        candidates = self.retrieve_documents(corrected)
        timings['document_retrieval'] = (time.time() - start) * 1000
        
        # Initial ranking
        start = time.time()
        ranked = self.initial_rank(candidates)
        timings['initial_ranking'] = (time.time() - start) * 1000
        
        # Re-ranking
        start = time.time()
        reranked = self.re_rank(ranked)
        timings['re_ranking'] = (time.time() - start) * 1000
        
        return timings

ℹ️

Latency Discussion Points: When discussing latency, cover:

Tail latency (p99) matters more than average
Use async I/O for network calls
Consider speculative execution for slow shards
Cache aggressively for repeated queries
Pre-compute features where possible

8. Advanced Topics

8.1 Query Understanding

class QueryUnderstanding:
    """Advanced query understanding pipeline"""
    
    def __init__(self):
        self.entity_linker = EntityLinker()
        self.intent_classifier = IntentClassifier()
        self.query_expander = QueryExpander()
        
    def understand_query(self, query):
        # Tokenization and normalization
        tokens = self.tokenize(query)
        
        # Spell correction
        corrected = self.spell_correct(query)
        
        # Entity recognition
        entities = self.entity_linker.extract(query)
        
        # Intent classification
        intent = self.intent_classifier.classify(query)
        
        # Query expansion (synonyms, related terms)
        expanded = self.query_expander.expand(query, intent)
        
        # Personalization context
        user_context = self.get_user_context()
        
        return {
            'original': query,
            'corrected': corrected,
            'entities': entities,
            'intent': intent,
            'expanded': expanded,
            'user_context': user_context
        }

8.2 Freshness-Aware Ranking

class FreshnessAwareRanker:
    """Ranking that considers content freshness"""
    
    def __init__(self):
        self.freshness_model = FreshnessPredictionModel()
        
    def compute_freshness_score(self, query, doc):
        # Query-dependent freshness needs
        freshness_need = self.assess_freshness_need(query)
        
        # Document age
        doc_age_days = (datetime.now() - doc.last_modified).days
        
        # Freshness decay function
        if freshness_need == 'breaking_news':
            # Exponential decay for breaking news
            freshness_score = math.exp(-doc_age_days / 1)  # Half-life of 1 day
        elif freshness_need == 'recent_events':
            # Linear decay for recent events
            freshness_score = max(0, 1 - doc_age_days / 30)
        elif freshness_need == 'evergreen':
            # No freshness penalty
            freshness_score = 1.0
        else:
            # Default moderate decay
            freshness_score = math.exp(-doc_age_days / 90)
        
        return freshness_score
    
    def assess_freshness_need(self, query):
        """Determine if query needs fresh results"""
        # Heuristics for freshness need
        if any(term in query.lower() for term in ['news', 'today', 'latest']):
            return 'breaking_news'
        elif any(term in query.lower() for term in ['2026', 'this year']):
            return 'recent_events'
        elif any(term in query.lower() for term in ['how to', 'what is']):
            return 'evergreen'
        else:
            return 'moderate'

8.3 Knowledge Graph Integration

class KnowledgeGraphIntegrator:
    """Integrate knowledge graph into search results"""
    
    def __init__(self):
        self.kg_client = KnowledgeGraphClient()
        
    def get_knowledge_panel(self, query, entities):
        """Get knowledge panel for entity queries"""
        panels = []
        
        for entity in entities:
            # Query knowledge graph
            entity_data = self.kg_client.get_entity(entity)
            
            if entity_data:
                panel = {
                    'title': entity_data.name,
                    'description': entity_data.description,
                    'image': entity_data.image_url,
                    'facts': self.extract_key_facts(entity_data),
                    'related_entities': self.get_related(entity_data),
                    'source': entity_data.source_url
                }
                panels.append(panel)
        
        return panels
    
    def extract_key_facts(self, entity_data):
        """Extract key facts for knowledge panel"""
        facts = []
        
        # Extract based on entity type
        if entity_data.type == 'Person':
            facts.extend([
                ('Born', entity_data.birth_date),
                ('Known for', entity_data.notable_works),
                ('Nationality', entity_data.nationality)
            ])
        elif entity_data.type == 'Organization':
            facts.extend([
                ('Founded', entity_data.founded_date),
                ('CEO', entity_data.ceo),
                ('Headquarters', entity_data.headquarters)
            ])
        
        return facts

ℹ️

Knowledge Graph Best Practices:

Always show sources for knowledge panel information
Handle ambiguous entities with disambiguation
Update knowledge graph continuously from trusted sources
Combine knowledge graph with web results for comprehensive answers

9. Implementation Roadmap

Phase 1: Core Search (Weeks 1-4)

Basic inverted index implementation
BM25 ranking baseline
Simple query parsing
Basic monitoring

Phase 2: ML Ranking (Weeks 5-8)

Feature engineering pipeline
LambdaMART ranking model
Online evaluation framework
A/B testing infrastructure

Phase 3: Advanced Features (Weeks 9-12)

Query understanding (NER, intent classification)
Knowledge graph integration
Personalization
Multi-result types (images, videos, news)

Phase 4: Optimization (Weeks 13-16)

Two-stage ranking with neural re-ranker
Real-time feature updates
Advanced caching strategies
Global distribution optimization

10. Summary and Key Takeaways

Architecture Recap

Multi-stage funnel: Query understanding → Retrieval → Ranking → Re-ranking
Distributed index: Shard by document, replicate for availability
Feature-rich ranking: 200+ signals per query-document pair
Continuous learning: Online learning from click feedback

Key Metrics

Relevance: nDCG, MRR, MAP
User satisfaction: CTR, abandonment rate, session success
Business: Ad revenue, search market share

Common Interview Mistakes

Ignoring the retrieval stage (jumping straight to ranking)
Not discussing cold start for new queries
Forgetting about position bias in click data
Not considering global distribution and latency
Ignoring result diversity and freshness

ℹ️

Final Interview Tip: Emphasize the trade-offs between relevance, freshness, and personalization. Discuss how you'd measure success beyond click-through rate (user satisfaction, task completion). Show understanding of both information retrieval fundamentals and modern ML approaches.

Design Google Search Ranking System

Design Google Search Ranking System

Interview Question

1. Requirements Gathering

Functional Requirements

Non-Functional Requirements

2. High-Level Architecture Overview

3. Data Pipeline Design

3.1 Data Sources

3.2 Web Crawl and Index Pipeline

3.3 Inverted Index Design

3.4 Feature Engineering Pipeline

4. Model Selection and Training Approach

4.1 Learning-to-Rank Models

4.2 Two-Stage Ranking Architecture

4.3 Multi-Signal Optimization

5. Serving Architecture

5.1 Real-time Search Serving

5.2 Distributed Index Serving

5.3 Caching Strategy

6. Monitoring and Observability

6.1 Search Quality Metrics

6.2 Online Experimentation

7. Scale Considerations and Trade-offs

7.1 Global Distribution

7.2 Cost vs Performance Trade-offs

7.3 Latency Budget Breakdown

8. Advanced Topics

8.1 Query Understanding

8.2 Freshness-Aware Ranking

8.3 Knowledge Graph Integration

9. Implementation Roadmap

Phase 1: Core Search (Weeks 1-4)

Phase 2: ML Ranking (Weeks 5-8)

Phase 3: Advanced Features (Weeks 9-12)

Phase 4: Optimization (Weeks 13-16)

10. Summary and Key Takeaways

Architecture Recap

Key Metrics

Common Interview Mistakes

Further Reading