Design Google Search Ranking System
Building intelligent search ranking for billions of web pages with sub-second latency
Interview Question
"Design a search ranking system like Google that can rank billions of web pages for millions of queries per second, with sub-200ms latency and continuously improving relevance through machine learning."
Difficulty: Hard | Frequently asked at Google, Microsoft/Bing, Amazon, Elasticsearch
1. Requirements Gathering
Functional Requirements
- Query Understanding: Parse, correct, and understand user search intent
- Document Retrieval: Efficiently retrieve candidate documents from billions of pages
- Relevance Ranking: ML-powered ranking that considers hundreds of signals
- Result Diversification: Multiple result types (web, images, videos, news, knowledge panel)
- Personalization: Results customized based on user history and preferences
- Auto-complete and Suggestions: Real-time query suggestions as user types
- Freshness: Recent content should rank appropriately for time-sensitive queries
Non-Functional Requirements
- Latency: < 200ms for full page results (including network)
- Throughput: 100,000+ queries per second globally
- Availability: 99.99% uptime (5 minutes downtime per year)
- Scale: Index 100+ billion web pages, handle 8.5 billion searches/day
- Freshness: New content indexed within hours, breaking news within minutes
- Consistency: Same query should return consistent results within a session
βΉοΈ
Scale Perspective: Google processes over 8.5 billion searches per day. The search index contains hundreds of billions of web pages. The ranking system evaluates over 200 signals per query in under 100ms. At peak, Google serves over 100,000 queries per second.
2. High-Level Architecture Overview
The search ranking system follows a multi-stage architecture: Query Understanding β Document Retrieval β Initial Ranking β Re-ranking β Result Assembly.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLIENT LAYER β
β Chrome Browser β Mobile App β Google Search β Voice Assistant β IoT β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GLOBAL LOAD BALANCER β
β Geographic Routing β Rate Limiting β DDoS Protection β SSL Termination β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββ
β QUERY UNDERSTANDING β β SPELL CHECK & β β AUTO-COMPLETE β
β SERVICE β β CORRECTION β β SERVICE β
ββββββββββββββββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DOCUMENT RETRIEVAL β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Web Index β β Image Index β β Video Index β β News Index β β
β β (Inverted) β β (Visual) β β (Metadata) β β (Temporal) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RANKING PIPELINE β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Initial β β ML Ranking β β Re-ranking β β Diversity β β
β β Retrieval β β (GBDT+NN) β β (Context) β β & Filtering β β
β β (~1000 docs) β β (~100 docs) β β (~20 docs) β β (10 docs) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RESULT ASSEMBLY β
β Knowledge Panel β Featured Snippets β People Also Ask β Related Searches β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA & ML INFRASTRUCTURE β
β Feature Store β Click Logs β Crawl Data β Knowledge Graph β Training Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘
Key Insight: The funnel architecture is essential for performance. You can't run expensive ML models on billions of documents. Instead, you use efficient retrieval to get ~1000 candidates, then apply increasingly sophisticated (and expensive) models to narrow down to ~100, then ~20, and finally ~10 results.
3. Data Pipeline Design
3.1 Data Sources
Web Crawl Data:
- Page content (HTML, text, structured data)
- Links (anchor text, link graph)
- Metadata (title, description, schema.org markup)
- Page quality signals (spam score, authority)
User Interaction Data:
- Search queries (with timestamps and location)
- Click-through data (what was clicked, what was skipped)
- Dwell time (how long users stayed on results)
- Reformulation patterns (query changes after initial search)
- Bounce rate (quick return to search results)
Query Logs:
- Query frequency and trends
- Query-document click pairs
- Session context (sequence of queries)
- Personalization signals
# Example: Search event schema
{
"event_id": "uuid-v4",
"session_id": "session_12345",
"user_id": "user_67890", # Optional, may be anonymous
"query": "machine learning tutorial",
"query_tokens": ["machine", "learning", "tutorial"],
"corrected_query": None,
"timestamp": "2026-06-29T14:30:00Z",
"location": {
"country": "US",
"region": "California",
"city": "San Francisco"
},
"device": "desktop",
"results_shown": [
{
"position": 1,
"url": "https://example.com/tutorial1",
"title": "ML Tutorial",
"clicked": true,
"dwell_time_seconds": 180
}
],
"session_queries": ["machine learning", "machine learning tutorial"]
}
3.2 Web Crawl and Index Pipeline
Seed URLs β Crawl Scheduler β Web Crawler β Content Processing β Index Builder
β β β β β
β β β βΌ βΌ
β β β Content Quality Inverted Index
β β β & Spam Detection + Forward Index
β β β β β
β β β βΌ βΌ
β β β Link Analysis Partitioned
β β β (PageRank) Index Shards
β β β β β
β β β ββββββββββββββββββββ
β β β β
β β β βΌ
β β β Index Serving Layer
β β β (Distributed Retrieval)
3.3 Inverted Index Design
# Inverted index structure for efficient document retrieval
class InvertedIndex:
def __init__(self):
self.index = {} # term -> [(doc_id, tf_idf_score, positions)]
def add_document(self, doc_id, tokens):
"""Add document to inverted index"""
term_freq = {}
for token in tokens:
term_freq[token] = term_freq.get(token, 0) + 1
for term, freq in term_freq.items():
if term not in self.index:
self.index[term] = []
# Calculate TF-IDF score
tf = freq / len(tokens)
idf = math.log(self.total_docs / (1 + len(self.index[term])))
score = tf * idf
self.index[term].append((doc_id, score, []))
def search(self, query_tokens, k=1000):
"""Retrieve top-k documents matching query"""
# Get posting lists for query terms
posting_lists = []
for token in query_tokens:
if token in self.index:
posting_lists.append(self.index[token])
# Intersect posting lists (simplified - real implementation uses skip lists)
candidates = {}
for posting_list in posting_lists:
for doc_id, score, _ in posting_list:
if doc_id in candidates:
candidates[doc_id] += score
else:
candidates[doc_id] = score
# Sort by score and return top-k
sorted_candidates = sorted(
candidates.items(),
key=lambda x: x[1],
reverse=True
)
return sorted_candidates[:k]
3.4 Feature Engineering Pipeline
class SearchFeatureExtractor:
"""Extract features for search ranking model"""
def extract_query_features(self, query):
return {
'query_length': len(query.split()),
'query_term_count': len(query.split()),
'has_quotes': '"' in query,
'has_site_operator': 'site:' in query,
'is_navigational': self.is_navigational_query(query),
'is_informational': self.is_informational_query(query),
'is_transactional': self.is_transactional_query(query),
'query_embedding': self.encode_query(query)
}
def extract_document_features(self, doc):
return {
# Content features
'page_rank': doc.page_rank,
'page_authority': doc.authority_score,
'content_length': len(doc.text),
'freshness_days': (datetime.now() - doc.last_modified).days,
# Content quality features
'grammar_score': doc.grammar_score,
'readability_score': doc.readability_score,
'spam_score': doc.spam_score,
# Link features
'inbound_links': doc.inbound_link_count,
'outbound_links': doc.outbound_link_count,
'domain_authority': doc.domain_authority,
# Technical features
'page_load_time': doc.page_load_time_ms,
'mobile_friendly': doc.is_mobile_friendly,
'https': doc.is_https,
# Structured data
'has_schema_markup': doc.has_schema_markup,
'has_open_graph': doc.has_open_graph
}
def extract_query_document_features(self, query, doc):
return {
# Text matching features
'bm25_score': self.compute_bm25(query, doc),
'title_match_score': self.title_match_score(query, doc.title),
'url_match_score': self.url_match_score(query, doc.url),
'anchor_text_match': self.anchor_text_match(query, doc),
# Semantic similarity
'query_doc_similarity': self.semantic_similarity(query, doc),
# Click-through features
'historical_ctr': self.get_ctr(query, doc),
'position_bias': self.get_position_bias(query, doc),
# Freshness features
'freshness_boost': self.freshness_score(query, doc)
}
β οΈ
Feature Engineering Pitfall: Don't just throw all features into a model. Feature selection and understanding feature interactions is critical. For example, BM25 score and page rank interact non-linearly - a high BM25 score from a low-authority page means something different than from a high-authority page.
4. Model Selection and Training Approach
4.1 Learning-to-Rank Models
Search ranking is a classic learning-to-rank problem. The main approaches are:
Pointwise Approach:
class PointwiseRanker:
"""Predict relevance score for each query-document pair independently"""
def __init__(self):
self.model = GradientBoostingClassifier()
def train(self, X, y):
"""
X: feature matrix [n_samples, n_features]
y: relevance labels [0, 1, 2, 3] (not relevant, partial, good, perfect)
"""
self.model.fit(X, y)
def predict(self, X):
return self.model.predict_proba(X)[:, 1]
Pairwise Approach (LambdaMART):
import lightgbm as lgb
class LambdaMARTRanker:
"""Rank documents by optimizing pairwise preferences"""
def __init__(self):
self.model = lgb.LGBMRanker(
objective='lambdarank',
metric='ndcg',
eval_at=[5, 10, 20],
num_leaves=64,
learning_rate=0.05,
n_estimators=1000
)
def train(self, X_train, y_train, group_train):
"""
group_train: number of documents per query
"""
self.model.fit(
X_train, y_train,
group=group_train,
eval_set=[(X_val, y_val)],
eval_group=[group_val],
callbacks=[lgb.early_stopping(50)]
)
Listwise Approach (ListNet):
class ListNetRanker:
"""Optimize list-level ranking metrics directly"""
def __init__(self):
self.model = self.build_model()
def build_model(self):
model = tf.keras.Sequential([
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
return model
def listnet_loss(self, y_true, y_pred):
"""ListNet cross-entropy loss"""
# Convert to probability distribution
y_true_prob = tf.nn.softmax(y_true)
y_pred_log_prob = tf.math.log(tf.nn.softmax(y_pred))
# KL divergence
return tf.reduce_sum(
y_true_prob * (tf.math.log(y_true_prob) - y_pred_log_prob),
axis=-1
)
4.2 Two-Stage Ranking Architecture
class TwoStageRanker:
"""Two-stage ranking: initial scoring + re-ranking"""
def __init__(self):
# Stage 1: Fast initial scoring (GBDT)
self.initial_scorer = lgb.LGBMRanker(
objective='lambdarank',
num_leaves=32,
n_estimators=500
)
# Stage 2: Precise re-ranking (Neural Network)
self.reranker = self.build_reranker()
def build_reranker(self):
"""Deep cross network for re-ranking"""
input_layer = tf.keras.Input(shape=(n_features,))
# Cross network for feature interactions
cross1 = CrossLayer()(input_layer)
cross2 = CrossLayer()(cross1)
cross3 = CrossLayer()(cross2)
# Deep network for high-order interactions
deep1 = Dense(512, activation='relu')(input_layer)
deep2 = Dense(256, activation='relu')(deep1)
deep3 = Dense(128, activation='relu')(deep2)
# Combine cross and deep
combined = Concatenate()([cross3, deep3])
output = Dense(1, activation='sigmoid')(combined)
return tf.keras.Model(inputs=input_layer, outputs=output)
def rank(self, query, candidates):
# Stage 1: Initial scoring
initial_scores = self.initial_scorer.predict(candidates.features)
# Take top-100 for re-ranking
top_candidates = candidates[np.argsort(initial_scores)[-100:]]
# Stage 2: Re-ranking with more features
rerank_scores = self.reranker.predict(top_candidates.rerank_features)
# Return top-10
final_ranking = np.argsort(rerank_scores)[-10:][::-1]
return top_candidates[final_ranking]
4.3 Multi-Signal Optimization
class MultiSignalRanker:
"""Optimize for multiple relevance signals"""
def __init__(self):
self.click_model = ClickPredictionModel()
self.dwell_time_model = DwellTimeModel()
self.satisfaction_model = SatisfactionModel()
def compute_composite_score(self, query, doc):
# Predict individual signals
p_click = self.click_model.predict(query, doc)
expected_dwell = self.dwell_time_model.predict(query, doc)
p_satisfied = self.satisfaction_model.predict(query, doc)
# Weighted combination (weights learned via grid search)
composite_score = (
0.4 * p_click +
0.3 * min(expected_dwell / 60, 1.0) + # Normalize to 0-1
0.3 * p_satisfied
)
return composite_score
βΉοΈ
Production Ranking System: In production, Google uses a combination of:
- GBDT for fast initial scoring (handles sparse features well)
- Deep neural networks for re-ranking (captures complex interactions)
- Multiple objective optimization (relevance, freshness, authority)
- Position-aware training (accounts for position bias in click data)
5. Serving Architecture
5.1 Real-time Search Serving
User Query β Load Balancer β Query Understanding Service
β
βΌ
βββββββββββββββββββ
β Spell Check & β
β Query Correction β
β (< 20ms) β
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Document β
β Retrieval β
β (Inverted Index) β
β (< 30ms) β
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Initial Ranking β
β (GBDT Model) β
β (< 20ms) β
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Re-ranking β
β (Neural Network) β
β (< 50ms) β
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Result Assembly β
β & Snippets β
β (< 30ms) β
βββββββββββββββββββ
β
βΌ
Response (< 150ms)
5.2 Distributed Index Serving
# Distributed index serving architecture
class DistributedSearchIndex:
def __init__(self, n_shards=1000):
self.n_shards = n_shards
self.shards = [SearchShard(i) for i in range(n_shards)]
def search(self, query, k=100):
"""Distributed search across shards"""
# Route query to relevant shards based on document partitioning
relevant_shards = self.route_query(query)
# Parallel search across shards
with ThreadPoolExecutor(max_workers=len(relevant_shards)) as executor:
futures = []
for shard in relevant_shards:
future = executor.submit(shard.search, query, k * 2)
futures.append(future)
# Merge results from all shards
all_results = []
for future in futures:
all_results.extend(future.result())
# Global re-ranking and top-k selection
global_ranking = self.global_rerank(query, all_results)
return global_ranking[:k]
def route_query(self, query):
"""Route query to relevant shards"""
# Simple approach: hash-based routing
query_hash = hash(query) % self.n_shards
# For search, we need to query multiple shards
# because any document could match any query
return self.shards # Query all shards in practice
class SearchShard:
def __init__(self, shard_id):
self.shard_id = shard_id
self.index = InvertedIndex()
self.cache = LRUCache(maxsize=10000)
def search(self, query, k):
# Check cache first
cache_key = f"{query}:{k}"
if cache_key in self.cache:
return self.cache[cache_key]
# Search local index
results = self.index.search(query, k)
# Cache results
self.cache[cache_key] = results
return results
5.3 Caching Strategy
class MultiLevelCache:
"""Multi-level caching for search results"""
def __init__(self):
# L1: In-memory cache (per server)
self.l1_cache = LRUCache(maxsize=10000) # ~100MB
# L2: Distributed cache (Redis cluster)
self.l2_cache = RedisCluster(
startup_nodes=[
{"host": "redis-1", "port": 6379},
{"host": "redis-2", "port": 6379}
]
)
# L3: Persistent cache (for long-tail queries)
self.l3_cache = DiskCache(directory='/cache/search')
def get(self, query, k):
"""Get cached results with fallback"""
cache_key = self.generate_key(query, k)
# Try L1
result = self.l1_cache.get(cache_key)
if result:
return result, "l1"
# Try L2
result = self.l2_cache.get(cache_key)
if result:
self.l1_cache.set(cache_key, result, ttl=300)
return result, "l2"
# Try L3
result = self.l3_cache.get(cache_key)
if result:
self.l1_cache.set(cache_key, result, ttl=300)
self.l2_cache.set(cache_key, result, ttl=3600)
return result, "l3"
return None, "miss"
def set(self, query, k, results, ttl=3600):
"""Set results in all cache levels"""
cache_key = self.generate_key(query, k)
self.l1_cache.set(cache_key, results, ttl=300)
self.l2_cache.set(cache_key, results, ttl=ttl)
self.l3_cache.set(cache_key, results, ttl=ttl * 24) # 24x longer in L3
π‘
Caching Insights: Google caches ~30% of search results. The most common queries (weather, time, calculator) are almost always cached. For personalized queries, use user-specific cache keys. Consider cache warming for trending topics.
6. Monitoring and Observability
6.1 Search Quality Metrics
class SearchQualityMetrics:
"""Comprehensive search quality metrics"""
# Relevance metrics
RELEVANCE_METRICS = [
'nDCG@5', # Normalized Discounted Cumulative Gain
'nDCG@10',
'nDCG@20',
'MRR', # Mean Reciprocal Rank
'MAP', # Mean Average Precision
'precision@1', # Precision at k
'recall@10'
]
# User satisfaction metrics
SATISFACTION_METRICS = [
'click_through_rate',
'abandonment_rate', # Sessions with no clicks
'reformulation_rate', # Queries that get rephrased
'session_success_rate'
]
# Business metrics
BUSINESS_METRICS = [
'ad_revenue_per_search',
'organic_click_rate',
'search_market_share'
]
def compute_ndcg(self, relevance_scores, k=10):
"""Compute nDCG@k"""
# Ideal ranking (sorted by relevance)
ideal_scores = sorted(relevance_scores, reverse=True)[:k]
# Actual ranking (as returned by system)
actual_scores = relevance_scores[:k]
# DCG
dcg = sum(
(2**score - 1) / math.log2(i + 2)
for i, score in enumerate(actual_scores)
)
# Ideal DCG
idcg = sum(
(2**score - 1) / math.log2(i + 2)
for i, score in enumerate(ideal_scores)
)
return dcg / idcg if idcg > 0 else 0
6.2 Online Experimentation
class SearchExperimentFramework:
"""A/B testing for search ranking"""
def __init__(self):
self.experiment_config = self.load_experiment_config()
def get_search_results(self, query, user_id):
"""Get results with experiment assignment"""
# Determine experiment variant
variant = self.assign_variant(user_id, query)
# Get results based on variant
if variant == 'control':
results = self.control_ranker.rank(query)
elif variant == 'treatment_a':
results = self.new_model_a.rank(query)
elif variant == 'treatment_b':
results = self.new_model_b.rank(query)
# Log experiment impression
self.log_impression(query, user_id, variant, results)
return results
def analyze_experiment(self, experiment_id):
"""Analyze experiment results"""
data = self.get_experiment_data(experiment_id)
results = {
'control': self.compute_metrics(data[data.variant == 'control']),
'treatment_a': self.compute_metrics(data[data.variant == 'treatment_a']),
'treatment_b': self.compute_metrics(data[data.variant == 'treatment_b'])
}
# Statistical significance testing
significance = {}
for variant in ['treatment_a', 'treatment_b']:
p_value = self.mannwhitney_u_test(
results['control']['nDCG@10'],
results[variant]['nDCG@10']
)
significance[variant] = p_value < 0.05
return results, significance
β οΈ
Experiment Pitfalls: Search experiments have unique challenges:
- Network effects: Users may share results across variants
- Position bias: Users tend to click top results regardless of relevance
- Learning effects: Users adapt to new ranking over time
- Query distribution: Long-tail queries may have insufficient data
7. Scale Considerations and Trade-offs
7.1 Global Distribution
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GLOBAL SEARCH INFRASTRUCTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β North Americaβ β Europe β β Asia Pacific β β South Americaβ β
β β Data Center β β Data Center β β Data Center β β Data Center β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β β β β
β ββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββ β
β β β β
β βΌ βΌ β
β βββββββββββββββββββββββββββββββ β
β β Global Load Balancer β β
β β (Anycast DNS + GSLB) β β
β βββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββ β
β β Cross-Region Replication β β
β β (Index + Query Logs) β β
β βββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7.2 Cost vs Performance Trade-offs
| Dimension | Option A (Cost Optimized) | Option B (Performance Optimized) |
|---|---|---|
| Index Storage | Compressed inverted index on disk | Full index in memory |
| Ranking Model | GBDT on CPU | Deep neural network on GPU |
| Caching | Aggressive caching, stale results | Minimal caching, fresh results |
| Freshness | Daily index updates | Real-time index updates |
| Personalization | Minimal personalization | Deep personalization |
7.3 Latency Budget Breakdown
class LatencyBudget:
"""Track and optimize latency budget"""
BUDGET = {
'network_roundtrip': 30, # ms
'query_parsing': 10, # ms
'spell_correction': 15, # ms
'document_retrieval': 40, # ms
'initial_ranking': 25, # ms
're_ranking': 50, # ms
'result_assembly': 20, # ms
'serialization': 10, # ms
'total': 200 # ms
}
def measure_latency_breakdown(self, query):
"""Measure actual latency breakdown"""
timings = {}
# Query parsing
start = time.time()
parsed_query = self.parse_query(query)
timings['query_parsing'] = (time.time() - start) * 1000
# Spell correction
start = time.time()
corrected = self.spell_correct(parsed_query)
timings['spell_correction'] = (time.time() - start) * 1000
# Document retrieval
start = time.time()
candidates = self.retrieve_documents(corrected)
timings['document_retrieval'] = (time.time() - start) * 1000
# Initial ranking
start = time.time()
ranked = self.initial_rank(candidates)
timings['initial_ranking'] = (time.time() - start) * 1000
# Re-ranking
start = time.time()
reranked = self.re_rank(ranked)
timings['re_ranking'] = (time.time() - start) * 1000
return timings
βΉοΈ
Latency Discussion Points: When discussing latency, cover:
- Tail latency (p99) matters more than average
- Use async I/O for network calls
- Consider speculative execution for slow shards
- Cache aggressively for repeated queries
- Pre-compute features where possible
8. Advanced Topics
8.1 Query Understanding
class QueryUnderstanding:
"""Advanced query understanding pipeline"""
def __init__(self):
self.entity_linker = EntityLinker()
self.intent_classifier = IntentClassifier()
self.query_expander = QueryExpander()
def understand_query(self, query):
# Tokenization and normalization
tokens = self.tokenize(query)
# Spell correction
corrected = self.spell_correct(query)
# Entity recognition
entities = self.entity_linker.extract(query)
# Intent classification
intent = self.intent_classifier.classify(query)
# Query expansion (synonyms, related terms)
expanded = self.query_expander.expand(query, intent)
# Personalization context
user_context = self.get_user_context()
return {
'original': query,
'corrected': corrected,
'entities': entities,
'intent': intent,
'expanded': expanded,
'user_context': user_context
}
8.2 Freshness-Aware Ranking
class FreshnessAwareRanker:
"""Ranking that considers content freshness"""
def __init__(self):
self.freshness_model = FreshnessPredictionModel()
def compute_freshness_score(self, query, doc):
# Query-dependent freshness needs
freshness_need = self.assess_freshness_need(query)
# Document age
doc_age_days = (datetime.now() - doc.last_modified).days
# Freshness decay function
if freshness_need == 'breaking_news':
# Exponential decay for breaking news
freshness_score = math.exp(-doc_age_days / 1) # Half-life of 1 day
elif freshness_need == 'recent_events':
# Linear decay for recent events
freshness_score = max(0, 1 - doc_age_days / 30)
elif freshness_need == 'evergreen':
# No freshness penalty
freshness_score = 1.0
else:
# Default moderate decay
freshness_score = math.exp(-doc_age_days / 90)
return freshness_score
def assess_freshness_need(self, query):
"""Determine if query needs fresh results"""
# Heuristics for freshness need
if any(term in query.lower() for term in ['news', 'today', 'latest']):
return 'breaking_news'
elif any(term in query.lower() for term in ['2026', 'this year']):
return 'recent_events'
elif any(term in query.lower() for term in ['how to', 'what is']):
return 'evergreen'
else:
return 'moderate'
8.3 Knowledge Graph Integration
class KnowledgeGraphIntegrator:
"""Integrate knowledge graph into search results"""
def __init__(self):
self.kg_client = KnowledgeGraphClient()
def get_knowledge_panel(self, query, entities):
"""Get knowledge panel for entity queries"""
panels = []
for entity in entities:
# Query knowledge graph
entity_data = self.kg_client.get_entity(entity)
if entity_data:
panel = {
'title': entity_data.name,
'description': entity_data.description,
'image': entity_data.image_url,
'facts': self.extract_key_facts(entity_data),
'related_entities': self.get_related(entity_data),
'source': entity_data.source_url
}
panels.append(panel)
return panels
def extract_key_facts(self, entity_data):
"""Extract key facts for knowledge panel"""
facts = []
# Extract based on entity type
if entity_data.type == 'Person':
facts.extend([
('Born', entity_data.birth_date),
('Known for', entity_data.notable_works),
('Nationality', entity_data.nationality)
])
elif entity_data.type == 'Organization':
facts.extend([
('Founded', entity_data.founded_date),
('CEO', entity_data.ceo),
('Headquarters', entity_data.headquarters)
])
return facts
βΉοΈ
Knowledge Graph Best Practices:
- Always show sources for knowledge panel information
- Handle ambiguous entities with disambiguation
- Update knowledge graph continuously from trusted sources
- Combine knowledge graph with web results for comprehensive answers
9. Implementation Roadmap
Phase 1: Core Search (Weeks 1-4)
- Basic inverted index implementation
- BM25 ranking baseline
- Simple query parsing
- Basic monitoring
Phase 2: ML Ranking (Weeks 5-8)
- Feature engineering pipeline
- LambdaMART ranking model
- Online evaluation framework
- A/B testing infrastructure
Phase 3: Advanced Features (Weeks 9-12)
- Query understanding (NER, intent classification)
- Knowledge graph integration
- Personalization
- Multi-result types (images, videos, news)
Phase 4: Optimization (Weeks 13-16)
- Two-stage ranking with neural re-ranker
- Real-time feature updates
- Advanced caching strategies
- Global distribution optimization
10. Summary and Key Takeaways
Architecture Recap
- Multi-stage funnel: Query understanding β Retrieval β Ranking β Re-ranking
- Distributed index: Shard by document, replicate for availability
- Feature-rich ranking: 200+ signals per query-document pair
- Continuous learning: Online learning from click feedback
Key Metrics
- Relevance: nDCG, MRR, MAP
- User satisfaction: CTR, abandonment rate, session success
- Business: Ad revenue, search market share
Common Interview Mistakes
- Ignoring the retrieval stage (jumping straight to ranking)
- Not discussing cold start for new queries
- Forgetting about position bias in click data
- Not considering global distribution and latency
- Ignoring result diversity and freshness
βΉοΈ
Final Interview Tip: Emphasize the trade-offs between relevance, freshness, and personalization. Discuss how you'd measure success beyond click-through rate (user satisfaction, task completion). Show understanding of both information retrieval fundamentals and modern ML approaches.
Further Reading
- "Learning to Rank for Information Retrieval" (Liu, 2009)
- "The PageRank Citation Ranking: Bringing Order to the Web" (Page et al., 1999)
- "Modern Information Retrieval: A Book Overview" (Baeza-Yates et al.)
- "Google Search Ranking Systems" (Google Research)
- "Deep Learning for Web Search: An Overview" (Microsoft Research)