Design Ad Click-Through Rate Prediction System
Building high-throughput CTR prediction for billions of ad impressions with sub-10ms latency
Interview Question
"Design a click-through rate (CTR) prediction system for a large-scale advertising platform like Google Ads or Facebook that can predict the probability of a user clicking on an ad given 100M+ daily active users and millions of advertisers, with sub-10ms latency for real-time bidding."
Difficulty: Hard | Frequently asked at Google, Meta/Facebook, Amazon, Microsoft/Bing Ads
1. Requirements Gathering
Functional Requirements
- CTR Prediction: Predict probability of ad click given user, ad, and context
- Real-time Bidding: Support real-time bidding with CTR predictions
- Ad Auction: Rank ads by predicted value (CTR Γ bid)
- Multiple Ad Formats: Support search, display, video, and social ads
- Targeting: Support demographic, behavioral, and interest-based targeting
- Budget Pacing: Ensure advertisers don't overspend their budgets
- Reporting: Real-time campaign performance metrics
Non-Functional Requirements
- Latency: < 10ms for CTR prediction (critical for real-time bidding)
- Throughput: 500,000+ predictions per second, billions daily
- Accuracy: AUC > 0.8, log loss < 0.3
- Freshness: Model updates within hours, feature updates within minutes
- Scale: 100M+ users, millions of ads, billions of impressions
- Availability: 99.99% uptime (revenue impact)
- Privacy: GDPR, CCPA compliance
βΉοΈ
Scale Perspective: Google Ads serves billions of ad impressions daily. Facebook's ad system processes over 10 million ad auctions per second. The CTR prediction model must evaluate hundreds of features in under 10ms while maintaining high accuracy for revenue optimization.
2. High-Level Architecture Overview
The CTR prediction system follows a pipeline architecture: Feature Engineering β Model Scoring β Ad Auction β Budget Pacing.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA SOURCES β
β User Activity β Advertiser Data β Publisher Inventory β Context Signals β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA INGESTION LAYER β
β Apache Kafka β Stream Processing β Feature Computation β Feature Store β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββ
β USER CONTEXT β β AD CATALOG β β REAL-TIME FEATURES β
β SERVICE β β SERVICE β β SERVICE β
β (Profile, History) β β (Ad Metadata) β β (Context, Session) β
ββββββββββββββββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CTR PREDICTION ENGINE β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Feature β β Model β β Calibration β β Explanation β β
β β Extraction β β Serving β β Layer β β Service β β
β β (< 2ms) β β (< 5ms) β β (< 1ms) β β (< 2ms) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AD AUCTION ENGINE β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Candidate β β Bid β β Ranking β β Winner β β
β β Selection β β Optimization β β β β Selection β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββ
β AD SERVING β β BILLING β β REPORTING β
β (Impression/Click) β β SERVICE β β SERVICE β
ββββββββββββββββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββ
π‘
Key Insight: The CTR prediction system must be extremely fast because it's on the critical path for every ad impression. Even a few milliseconds of additional latency can significantly impact revenue. The system must balance model complexity (accuracy) with inference speed.
3. Data Pipeline Design
3.1 Data Sources and Features
from dataclasses import dataclass
from typing import List, Dict, Optional
from datetime import datetime
@dataclass
class AdImpressionContext:
# User features
user_id: str
user_age: int
user_gender: str
user_location: str
user_device: str
user_browser: str
user_os: str
# User behavioral features
user_interests: List[str]
user_purchase_history: List[Dict]
user_browse_history: List[str]
user_search_queries: List[str]
# Ad features
ad_id: str
advertiser_id: str
campaign_id: str
ad_type: str # search, display, video, social
ad_category: str
ad_keywords: List[str]
ad_budget_daily: float
ad_bid_amount: float
# Context features
publisher_id: str
page_category: str
time_of_day: int
day_of_week: int
device_type: str
connection_type: str # wifi, 4g, 3g
# Historical features
historical_ctr: float
historical_impressions: int
historical_clicks: int
# Real-time features
session_length: int
pages_viewed: int
time_on_site: int
recent_searches: List[str]
3.2 Feature Engineering Pipeline
class CTRFeatureEngine:
"""Feature engineering for CTR prediction"""
def __init__(self, feature_store):
self.feature_store = feature_store
async def extract_features(self, context: AdImpressionContext) -> Dict:
"""Extract all features for CTR prediction"""
features = {}
# User features
user_features = await self.extract_user_features(context.user_id)
features.update(user_features)
# Ad features
ad_features = await self.extract_ad_features(context.ad_id)
features.update(ad_features)
# Cross features (user-ad interactions)
cross_features = await self.extract_cross_features(context)
features.update(cross_features)
# Context features
context_features = self.extract_context_features(context)
features.update(context_features)
# Historical features
historical_features = await self.extract_historical_features(context)
features.update(historical_features)
return features
async def extract_user_features(self, user_id):
"""Extract user-specific features"""
user_data = await self.feature_store.get_user_features(user_id)
return {
# Demographic features
'user_age': user_data.get('age', 0),
'user_gender_encoded': self.encode_gender(user_data.get('gender')),
'user_country': self.encode_country(user_data.get('country')),
# Interest features
'user_interest_count': len(user_data.get('interests', [])),
'user_interest_embedding': self.get_interest_embedding(user_data.get('interests')),
# Behavioral features
'user_avg_session_length': user_data.get('avg_session_length', 0),
'user_pages_per_session': user_data.get('pages_per_session', 0),
'user_purchase_frequency': user_data.get('purchase_frequency', 0),
# Engagement features
'user_click_rate_7d': user_data.get('click_rate_7d', 0),
'user_click_rate_30d': user_data.get('click_rate_30d', 0),
'user_impressions_7d': user_data.get('impressions_7d', 0),
# Device features
'user_primary_device': self.encode_device(user_data.get('primary_device')),
'user_mobile_ratio': user_data.get('mobile_ratio', 0.5)
}
async def extract_cross_features(self, context):
"""Extract user-ad interaction features"""
# Category affinity
category_affinity = await self.compute_category_affinity(
context.user_id,
context.ad_category
)
# Advertiser affinity
advertiser_affinity = await self.compute_advertiser_affinity(
context.user_id,
context.advertiser_id
)
# Keyword matching
keyword_overlap = self.compute_keyword_overlap(
context.user_search_queries,
context.ad_keywords
)
return {
'category_affinity': category_affinity,
'advertiser_affinity': advertiser_affinity,
'keyword_overlap_score': keyword_overlap,
'user_ad_category_clicks': await self.get_user_category_clicks(
context.user_id,
context.ad_category
),
'user_advertiser_impressions': await self.get_user_advertiser_impressions(
context.user_id,
context.advertiser_id
)
}
3.3 Real-time Feature Store
class RealTimeFeatureStore:
"""Real-time feature store for CTR prediction"""
def __init__(self):
# Online store for real-time serving
self.online_store = RedisCluster(
startup_nodes=[
{"host": "redis-1", "port": 6379},
{"host": "redis-2", "port": 6379}
]
)
# Offline store for training
self.offline_store = BigQueryClient()
# Streaming features
self.stream_processor = FlinkStreamingProcessor()
async def get_features(self, user_id: str, ad_id: str, context: Dict) -> Dict:
"""Get all features for CTR prediction"""
pipeline = self.online_store.pipeline()
# User features
pipeline.hgetall(f"user:{user_id}:features")
# Ad features
pipeline.hgetall(f"ad:{ad_id}:features")
# Cross features
pipeline.hgetall(f"cross:{user_id}:{ad_id}")
# Context features
pipeline.hget(f"context:features", f"{context['device']}:{context['time_of_day']}")
results = await pipeline.execute()
return {
'user_features': self.deserialize(results[0]),
'ad_features': self.deserialize(results[1]),
'cross_features': self.deserialize(results[2]),
'context_features': self.deserialize(results[3])
}
async def update_real_time_features(self, event_type: str, event_data: Dict):
"""Update features in real-time based on events"""
if event_type == 'impression':
await self.update_impression_features(event_data)
elif event_type == 'click':
await self.update_click_features(event_data)
elif event_type == 'conversion':
await self.update_conversion_features(event_data)
async def update_impression_features(self, event_data):
"""Update features after impression"""
user_id = event_data['user_id']
ad_id = event_data['ad_id']
# Update user impression count
key = f"user:{user_id}:impressions"
await self.online_store.zincrby(key, 1, ad_id)
# Update ad impression count
key = f"ad:{ad_id}:impressions"
await self.online_store.incr(key)
# Update cross features
key = f"cross:{user_id}:{ad_id}"
await self.online_store.hincrby(key, 'impressions', 1)
β οΈ
Feature Engineering Pitfall: Be careful about data leakage in CTR prediction. Features must be computed using only information available at the time of prediction. For example, don't use click-through rate from today's impressions to predict clicks on those same impressions.
4. Model Selection and Training Approach
4.1 Model Architecture Options
Option 1: Logistic Regression with Feature Crossings (Google AdWords-style)
import tensorflow as tf
class LogisticRegressionCTR(tf.keras.Model):
"""Logistic regression with feature crossings (FTRL-style)"""
def __init__(self, feature_dims, embedding_dim=16):
super().__init__()
# Linear part
self.linear = tf.keras.layers.Dense(1)
# Feature embeddings for crossings
self.embeddings = {}
for feature_name, dim in feature_dims.items():
self.embeddings[feature_name] = tf.keras.layers.Embedding(
dim, embedding_dim
)
# Cross network
self.cross_layers = CrossNetwork(num_layers=3)
def call(self, inputs):
# Linear part
linear_output = self.linear(inputs['numerical_features'])
# Embedding part
embedding_outputs = []
for feature_name, embedding_layer in self.embeddings.items():
if feature_name in inputs:
emb = embedding_layer(inputs[feature_name])
embedding_outputs.append(emb)
# Concatenate embeddings
if embedding_outputs:
embedding_concat = tf.concat(embedding_outputs, axis=1)
else:
embedding_concat = None
# Cross network for feature interactions
if embedding_concat is not None:
cross_output = self.cross_layers(embedding_concat)
else:
cross_output = None
# Combine
if cross_output is not None:
combined = tf.concat([linear_output, cross_output], axis=1)
else:
combined = linear_output
return tf.sigmoid(combined)
Option 2: Deep & Cross Network (DCN-v2)
class DCNv2CTR(tf.keras.Model):
"""Deep & Cross Network v2 for CTR prediction"""
def __init__(self, feature_dims, embedding_dim=16, num_cross_layers=3):
super().__init__()
# Feature embeddings
self.embeddings = {}
for feature_name, dim in feature_dims.items():
self.embeddings[feature_name] = tf.keras.layers.Embedding(
dim, embedding_dim
)
# Cross network
self.cross_layers = [
CrossLayerV2(embedding_dim * len(feature_dims))
for _ in range(num_cross_layers)
]
# Deep network
self.deep_network = tf.keras.Sequential([
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(128, activation='relu')
])
# Output layer
self.output_layer = tf.keras.layers.Dense(1, activation='sigmoid')
def call(self, inputs, training=False):
# Get embeddings
embedding_outputs = []
for feature_name, embedding_layer in self.embeddings.items():
emb = embedding_layer(inputs[feature_name])
embedding_outputs.append(emb)
# Concatenate all embeddings
all_embeddings = tf.concat(embedding_outputs, axis=1)
# Cross network
cross_output = all_embeddings
for cross_layer in self.cross_layers:
cross_output = cross_layer(cross_output)
# Deep network
deep_output = self.deep_network(all_embeddings, training=training)
# Combine cross and deep
combined = tf.concat([cross_output, deep_output], axis=1)
return self.output_layer(combined)
class CrossLayerV2(tf.keras.layers.Layer):
"""Cross layer v2 with matrix decomposition"""
def __init__(self, input_dim):
super().__init__()
self.weight = self.add_weight(
shape=(input_dim, input_dim),
initializer='glorot_uniform',
trainable=True
)
self.bias = self.add_weight(
shape=(input_dim,),
initializer='zeros',
trainable=True
)
def call(self, x):
# x_{l+1} = x_0 β (W_l * x_l + b_l) + x_l
x0 = x # Save original input
xl = x
# Matrix multiplication
xl_w = tf.matmul(xl, self.weight)
# Add bias
xl_w_b = xl_w + self.bias
# Element-wise product with x0
cross = x0 * xl_w_b
# Add residual connection
return cross + xl
Option 3: Deep Interest Network (DIN)
class DINCTR(tf.keras.Model):
"""Deep Interest Network for behavior-based CTR prediction"""
def __init__(self, feature_dims, embedding_dim=16):
super().__init__()
# User behavior sequence embedding
self.behavior_embedding = tf.keras.layers.Embedding(
feature_dims['item_count'], embedding_dim
)
# Attention mechanism for behavior interests
self.attention = tf.keras.layers.MultiHeadAttention(
num_heads=4, key_dim=embedding_dim
)
# Target item embedding
self.target_embedding = tf.keras.layers.Embedding(
feature_dims['item_count'], embedding_dim
)
# Deep network
self.deep_network = tf.keras.Sequential([
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(64, activation='relu')
])
# Output layer
self.output_layer = tf.keras.layers.Dense(1, activation='sigmoid')
def call(self, inputs):
# Get behavior sequence embeddings
behavior_seq = self.behavior_embedding(inputs['user_behavior_seq'])
# Get target item embedding
target_emb = self.target_embedding(inputs['target_item'])
# Apply attention mechanism
# Query: target item, Key/Value: behavior sequence
attention_output = self.attention(
query=target_emb,
key=behavior_seq,
value=behavior_seq
)
# Pool attention output
user_interest = tf.reduce_sum(attention_output, axis=1)
# Combine with other features
other_features = inputs['other_features']
combined = tf.concat([user_interest, other_features], axis=1)
# Deep network
deep_output = self.deep_network(combined)
return self.output_layer(deep_output)
4.2 Training Strategy
class CTRTrainingPipeline:
"""Training pipeline for CTR prediction models"""
def __init__(self, model, config):
self.model = model
self.config = config
def train(self, train_data, val_data):
"""Train model with early stopping"""
# Compile model
self.model.compile(
optimizer=tf.keras.optimizers.Adam(
learning_rate=self.config.learning_rate
),
loss=self.focal_loss, # Handle class imbalance
metrics=[
tf.keras.metrics.AUC(name='auc'),
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall')
]
)
# Callbacks
callbacks = [
tf.keras.callbacks.EarlyStopping(
monitor='val_auc',
patience=10,
mode='max',
restore_best_weights=True
),
tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_auc',
factor=0.5,
patience=5,
mode='max'
),
tf.keras.callbacks.TensorBoard(
log_dir=self.config.tensorboard_dir
)
]
# Train
history = self.model.fit(
train_data,
validation_data=val_data,
epochs=self.config.epochs,
batch_size=self.config.batch_size,
callbacks=callbacks
)
return history
def focal_loss(self, y_true, y_pred, alpha=0.25, gamma=2.0):
"""Focal loss for handling class imbalance in CTR"""
y_pred = tf.clip_by_value(y_pred, 1e-7, 1 - 1e-7)
# Binary cross entropy
bce = -y_true * tf.math.log(y_pred) - (1 - y_true) * tf.math.log(1 - y_pred)
# Focal modulating factor
p_t = y_true * y_pred + (1 - y_true) * (1 - y_pred)
alpha_t = y_true * alpha + (1 - y_true) * (1 - alpha)
focal_weight = alpha_t * tf.pow(1 - p_t, gamma)
return focal_weight * bce
def calibrate_predictions(self, y_true, y_pred):
"""Calibrate model predictions using Platt scaling"""
from sklearn.calibration import CalibratedClassifierCV
# Fit calibration model
calibrator = CalibratedClassifierCV(
self.model,
method='sigmoid',
cv='prefit'
)
calibrator.fit(y_true, y_pred)
return calibrator
βΉοΈ
Model Selection Strategy: Start with simpler models (logistic regression) for baseline, then iterate with more complex models (DCN-v2, DIN). In production, ensemble multiple models for better performance. Always calibrate predictions for accurate bidding.
5. Serving Architecture
5.1 Real-time CTR Prediction Serving
Ad Request β Load Balancer β Feature Retrieval β Model Scoring β Calibration β Response
β β β β
βΌ βΌ βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β Request β β Feature β β Model β β Calib- β
β Router β β Store β β Serving β β ration β
β β β (Redis) β β (GPU) β β Layer β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
5.2 Model Serving Options
class CTRModelServing:
"""Model serving for CTR prediction"""
def __init__(self):
# Option 1: TensorFlow Serving (for TensorFlow models)
self.tf_serving = TFServingClient(
host='tensorflow-serving:8501'
)
# Option 2: ONNX Runtime (for ONNX models)
self.onnx_session = ort.InferenceSession(
"ctr_model.onnx",
providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
)
# Option 3: Triton Inference Server (multi-framework)
self.triton_client = tritonclient.http.InferenceServerClient(
url='triton-server:8000'
)
# Model ensemble
self.models = {
'dcn_v2': DCNv2Model(),
'din': DINModel(),
'logistic': LogisticRegressionModel()
}
# Model weights for ensemble
self.model_weights = {
'dcn_v2': 0.4,
'din': 0.3,
'logistic': 0.3
}
async def predict(self, features: Dict) -> Dict:
"""Get CTR prediction"""
predictions = {}
# Get predictions from all models
for model_name, model in self.models.items():
prediction = await model.predict(features)
predictions[model_name] = prediction
# Weighted ensemble
ensemble_prediction = sum(
predictions[name] * self.model_weights[name]
for name in predictions
)
# Calibrate prediction
calibrated_prediction = self.calibrator.predict(ensemble_prediction)
return {
'ctr_prediction': float(calibrated_prediction[0]),
'individual_predictions': predictions,
'model_version': self.get_model_version(),
'inference_time_ms': self.measure_latency()
}
5.3 Latency Optimization
class LatencyOptimizer:
"""Optimize latency for CTR prediction"""
def __init__(self):
self.feature_cache = LRUCache(maxsize=1000000)
self.model_cache = ModelCache()
async def optimize_prediction(self, request):
"""Optimize CTR prediction for low latency"""
# Step 1: Check feature cache
cache_key = self.generate_cache_key(request)
cached_features = self.feature_cache.get(cache_key)
if cached_features:
features = cached_features
else:
# Step 2: Compute features in parallel
features = await self.compute_features_parallel(request)
self.feature_cache.set(cache_key, features, ttl=30)
# Step 3: Use cached model predictions for similar requests
prediction_cache_key = self.generate_prediction_cache_key(features)
cached_prediction = self.model_cache.get(prediction_cache_key)
if cached_prediction:
return cached_prediction
# Step 4: Batch prediction for similar requests
if self.can_batch(request):
return await self.batch_predict(features)
else:
return await self.individual_predict(features)
async def compute_features_parallel(self, request):
"""Compute features in parallel for lower latency"""
tasks = [
self.get_user_features(request.user_id),
self.get_ad_features(request.ad_id),
self.get_cross_features(request.user_id, request.ad_id),
self.get_context_features(request.context)
]
results = await asyncio.gather(*tasks)
return {
'user_features': results[0],
'ad_features': results[1],
'cross_features': results[2],
'context_features': results[3]
}
π‘
Latency Optimization Tips:
- Cache features aggressively (user features change slowly)
- Use model distillation for faster inference
- Consider quantization for neural network models
- Use async I/O for database calls
- Implement request batching for GPU efficiency
6. Monitoring and Observability
6.1 Key Metrics
class CTRMonitoringMetrics:
"""Comprehensive monitoring for CTR prediction"""
# Model performance metrics
MODEL_METRICS = [
'auc_roc',
'log_loss',
'calibration_error',
'precision_at_k',
'recall_at_k'
]
# Business metrics
BUSINESS_METRICS = [
'click_through_rate',
'conversion_rate',
'revenue_per_impression',
'cost_per_click',
'advertiser_roi'
]
# Operational metrics
OPERATIONAL_METRICS = [
'prediction_latency_p50',
'prediction_latency_p95',
'prediction_latency_p99',
'throughput_qps',
'error_rate',
'feature_freshness'
]
# Drift metrics
DRIFT_METRICS = [
'feature_distribution_drift',
'prediction_distribution_drift',
'concept_drift_score',
'model_performance_degradation'
]
6.2 Real-time Monitoring Dashboard
class CTRMonitoringDashboard:
"""Real-time monitoring dashboard for CTR prediction"""
def __init__(self):
self.metrics_collector = MetricsCollector()
self.alert_manager = AlertManager()
async def update_dashboard(self):
"""Update monitoring dashboard"""
metrics = await self.collect_metrics()
# Check for anomalies
anomalies = self.detect_anomalies(metrics)
# Send alerts if needed
for anomaly in anomalies:
await self.alert_manager.send_alert(anomaly)
# Update dashboard
await self.update_visualizations(metrics, anomalies)
async def collect_metrics(self):
"""Collect all monitoring metrics"""
return {
'real_time': {
'ctr_prediction_avg': await self.get_avg_ctr_prediction(),
'actual_ctr': await self.get_actual_ctr(),
'calibration_error': await self.get_calibration_error(),
'latency_avg_ms': await self.get_avg_latency()
},
'hourly': {
'total_impressions': await self.get_hourly_impressions(),
'total_clicks': await self.get_hourly_clicks(),
'total_revenue': await self.get_hourly_revenue(),
'model_drift_score': await self.get_model_drift()
},
'daily': {
'auc_roc': await self.get_daily_auc(),
'log_loss': await self.get_daily_log_loss(),
'advertiser_satisfaction': await self.get_advertiser_satisfaction()
}
}
def detect_anomalies(self, metrics):
"""Detect anomalies in metrics"""
anomalies = []
# Check for calibration drift
calibration_error = metrics['real_time']['calibration_error']
if calibration_error > 0.1: # Threshold
anomalies.append({
'type': 'CALIBRATION_DRIFT',
'severity': 'HIGH',
'message': f'Calibration error spike: {calibration_error:.3f}',
'current': calibration_error,
'threshold': 0.1
})
# Check for CTR drop
predicted_ctr = metrics['real_time']['ctr_prediction_avg']
actual_ctr = metrics['real_time']['actual_ctr']
if actual_ctr < predicted_ctr * 0.8: # 20% drop
anomalies.append({
'type': 'CTR_DROP',
'severity': 'HIGH',
'message': f'Actual CTR significantly lower than predicted',
'predicted': predicted_ctr,
'actual': actual_ctr
})
return anomalies
β οΈ
Critical Monitoring Points:
- Calibration: Model predictions must be well-calibrated for accurate bidding
- Feature drift: Monitor for changes in feature distributions
- Revenue impact: Track business metrics alongside model metrics
- A/B test results: Monitor experiment results for model updates
7. Scale Considerations and Trade-offs
7.1 Horizontal Scaling
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SCALING ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Prediction Requests β
β βββ Load balancer distributes across prediction servers β
β βββ Server 1: GPU-based neural network models β
β βββ Server 2: CPU-based GBDT models β
β βββ Server 3: Fallback models β
β β
β Feature Store β
β βββ Redis cluster with read replicas β
β βββ Primary: Write operations β
β βββ Replicas: Read operations β
β β
β Model Training β
β βββ Distributed training with parameter server β
β βββ Worker 1: Data parallelism β
β βββ Worker 2: Data parallelism β
β βββ Parameter Server: Model synchronization β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7.2 Cost vs Performance Trade-offs
| Dimension | Option A (Cost Optimized) | Option B (Performance Optimized) |
|---|---|---|
| Model Complexity | Logistic regression (fast) | Deep neural network (accurate) |
| Feature Freshness | Batch features (hourly) | Real-time features (streaming) |
| Model Retraining | Weekly retraining | Daily retraining |
| Inference Hardware | CPU instances | GPU instances |
| Caching | Aggressive caching | Minimal caching |
7.3 Latency Budget Breakdown
class LatencyBudget:
"""Track and optimize latency budget for CTR prediction"""
BUDGET = {
'feature_retrieval': 3, # ms
'feature_computation': 2, # ms
'model_inference': 4, # ms
'calibration': 0.5, # ms
'response_serialization': 0.5, # ms
'total': 10 # ms
}
def measure_latency_breakdown(self, request):
"""Measure actual latency breakdown"""
timings = {}
# Feature retrieval
start = time.time()
features = self.retrieve_features(request)
timings['feature_retrieval'] = (time.time() - start) * 1000
# Feature computation
start = time.time()
computed_features = self.compute_features(features)
timings['feature_computation'] = (time.time() - start) * 1000
# Model inference
start = time.time()
prediction = self.model.predict(computed_features)
timings['model_inference'] = (time.time() - start) * 1000
# Calibration
start = time.time()
calibrated = self.calibrator.predict(prediction)
timings['calibration'] = (time.time() - start) * 1000
return timings
βΉοΈ
Latency Discussion Points: When discussing latency, emphasize:
- Tail latency (p99) matters more than average
- Use async I/O for database calls
- Consider model distillation for faster inference
- Cache aggressively for repeated requests
- Use request batching for GPU efficiency
8. Advanced Topics
8.1 Real-time Bidding Integration
class RealTimeBiddingSystem:
"""Real-time bidding system using CTR predictions"""
def __init__(self):
self.ctr_predictor = CTRPredictor()
self.bid_optimizer = BidOptimizer()
async def handle_bid_request(self, bid_request):
"""Handle real-time bid request"""
# Step 1: Extract features
features = await self.extract_features(bid_request)
# Step 2: Get CTR prediction
ctr_prediction = await self.ctr_predictor.predict(features)
# Step 3: Calculate optimal bid
optimal_bid = self.bid_optimizer.calculate_bid(
ctr_prediction=ctr_prediction,
advertiser_budget=bid_request.advertiser_budget,
expected_value=bid_request.expected_value,
competition_level=bid_request.competition_level
)
# Step 4: Apply bid constraints
final_bid = self.apply_bid_constraints(
optimal_bid,
bid_request.advertiser_max_bid,
bid_request.floor_price
)
return {
'bid_amount': final_bid,
'ctr_prediction': ctr_prediction,
'expected_value': ctr_prediction * bid_request.expected_value,
'bid_strategy': 'value_optimization'
}
def calculate_bid(self, ctr_prediction, advertiser_budget, expected_value, competition_level):
"""Calculate optimal bid using value optimization"""
# Expected value per impression
expected_value_per_impression = ctr_prediction * expected_value
# Adjust for competition
competition_adjustment = self.get_competition_adjustment(competition_level)
# Calculate bid
bid = expected_value_per_impression * competition_adjustment
# Apply budget pacing
bid = self.apply_budget_pacing(bid, advertiser_budget)
return bid
8.2 Multi-Objective Optimization
class MultiObjectiveOptimizer:
"""Optimize for multiple objectives in ad ranking"""
def __init__(self):
self.ctr_model = CTRModel()
self.cvr_model = CVRModel() # Conversion rate
self.value_model = ValueModel() # Expected value
def rank_ads(self, candidates, user_context):
"""Rank ads considering multiple objectives"""
scores = []
for ad in candidates:
# Predict CTR
ctr = self.ctr_model.predict(user_context, ad)
# Predict CVR (given click)
cvr = self.cvr_model.predict(user_context, ad)
# Predict value (given conversion)
value = self.value_model.predict(user_context, ad)
# Calculate composite score
composite_score = self.calculate_composite_score(
ctr, cvr, value, ad.bid_amount
)
scores.append({
'ad': ad,
'ctr': ctr,
'cvr': cvr,
'value': value,
'composite_score': composite_score
})
# Sort by composite score
ranked_candidates = sorted(
scores,
key=lambda x: x['composite_score'],
reverse=True
)
return ranked_candidates
def calculate_composite_score(self, ctr, cvr, value, bid):
"""Calculate composite score for ad ranking"""
# Revenue optimization
revenue_score = ctr * cvr * value
# User experience (penalize low-quality ads)
quality_score = self.calculate_quality_score(ctr, cvr)
# Bid competitiveness
bid_score = bid / self.get_average_bid()
# Weighted combination
composite_score = (
0.5 * revenue_score +
0.3 * quality_score +
0.2 * bid_score
)
return composite_score
8.3 Explainable CTR Predictions
class CTRExplainer:
"""Generate explanations for CTR predictions"""
def __init__(self):
self.shap_explainer = shap.TreeExplainer(self.model)
def explain_prediction(self, features):
"""Generate explanation for CTR prediction"""
# Get SHAP values
shap_values = self.shap_explainer.shap_values(features)
# Get top contributing features
feature_importance = list(zip(
self.feature_names,
shap_values[0]
))
feature_importance.sort(key=lambda x: abs(x[1]), reverse=True)
# Generate human-readable explanation
explanation = self.generate_narrative(
features,
feature_importance[:5]
)
return {
'shap_values': shap_values,
'top_features': feature_importance[:5],
'narrative': explanation,
'confidence': self.compute_confidence(shap_values)
}
def generate_narrative(self, features, top_features):
"""Generate human-readable explanation"""
narrative_parts = []
for feature_name, importance in top_features:
if feature_name == 'user_interest_match' and features[feature_name] > 0.8:
narrative_parts.append(
"Strong interest match between user and ad category"
)
elif feature_name == 'historical_ctr' and features[feature_name] > 0.05:
narrative_parts.append(
"User has high historical click-through rate"
)
elif feature_name == 'ad_quality_score' and features[feature_name] > 0.9:
narrative_parts.append(
"High-quality ad with good engagement metrics"
)
return "Key factors: " + "; ".join(narrative_parts)
βΉοΈ
Production Best Practices:
- Always calibrate model predictions for accurate bidding
- Use A/B testing for model updates
- Monitor both model metrics and business metrics
- Implement feedback loops from advertiser performance
- Ensure explainability for advertiser transparency
9. Implementation Roadmap
Phase 1: Baseline (Weeks 1-3)
- Implement logistic regression baseline
- Set up feature engineering pipeline
- Create basic model serving
- Establish monitoring metrics
Phase 2: Advanced Models (Weeks 4-7)
- Implement DCN-v2 model
- Add deep interest network
- Create model ensemble
- Set up A/B testing framework
Phase 3: Real-time Features (Weeks 8-11)
- Implement real-time feature store
- Add streaming feature computation
- Optimize latency for real-time bidding
- Create advanced monitoring dashboard
Phase 4: Optimization (Weeks 12-14)
- Multi-objective optimization
- Advanced explainability
- Cost optimization
- Global distribution
10. Summary and Key Takeaways
Architecture Recap
- Multi-stage pipeline: Feature engineering β Model scoring β Calibration β Bidding
- Real-time features: Streaming features for up-to-date predictions
- Model ensemble: Combine multiple models for better performance
- Latency optimization: Critical for real-time bidding
Key Metrics
- Model Performance: AUC, log loss, calibration error
- Business Metrics: CTR, conversion rate, revenue per impression
- Operational Metrics: Latency, throughput, error rate
Common Interview Mistakes
- Ignoring class imbalance (most impressions don't get clicks)
- Not discussing calibration for bidding
- Forgetting about real-time feature requirements
- Not considering latency constraints
- Ignoring business metrics alongside model metrics
βΉοΈ
Final Interview Tip: Emphasize the business impact of CTR prediction. Discuss the trade-off between model accuracy and latency. Show understanding of both ML techniques and production requirements for advertising systems.
Further Reading
- "Deep & Cross Network for Ad Click Predictions" (Google)
- "Deep Interest Network for Click-Through Rate Prediction" (Alibaba)
- "FTRL-Proximal Online Learning Algorithm" (Google)
- "Real-Time Bidding for Display Advertising" (Yahoo)
- "Ad Click Prediction: a View from the Trenches" (Google)