Google & Netflix Interview

Model Deployment: A/B Testing, Model Serving & Drift Detection

From notebook to production: the deployment lifecycle

Interview Question

"Explain the challenges of deploying ML models in production. How do you set up A/B testing for model comparison? What is model drift and how do you detect it?"

Difficulty: Hard | Frequently asked at Google, Netflix, Amazon

Theoretical Foundation

Model Deployment Challenges

Scalability: Serving millions of requests per second
Latency: Real-time predictions in milliseconds
Reliability: 99.99% uptime requirements
Monitoring: Detecting model degradation
Versioning: Managing multiple model versions
Security: Protecting against adversarial attacks

Model Serving Architectures

Batch Serving

Pre-compute predictions offline
Store in database/cache
Serve via lookup
Use case: Recommendations, daily reports

Real-time Serving

Compute predictions on-demand
Requires low-latency inference
Use case: Fraud detection, search ranking

Streaming Serving

Process data streams in real-time
Update predictions incrementally
Use case: IoT, real-time monitoring

Model Serving Optimization

Model Compression: Pruning, quantization, knowledge distillation
Caching: Cache frequent predictions
Batching: Process multiple requests together
Hardware Acceleration: GPU, TPU, FPGA
Edge Deployment: Deploy to edge devices

A/B Testing for ML

Setup:

Split traffic between model variants
Random assignment ensures unbiased comparison
Statistical significance testing

Key Metrics:

Online metrics: CTR, conversion rate, revenue
Model metrics: Accuracy, latency, throughput
Business metrics: ROI, customer satisfaction

Statistical Testing:

t-test: Compare means of two groups
Chi-squared test: Compare proportions
Bayesian testing: Probabilistic comparison

ℹ️

Key Insight: A/B testing is crucial because offline metrics don't always correlate with online performance. A model with higher accuracy might perform worse due to latency or user experience factors.

Model Drift

Concept Drift

The relationship between features and target changes over time.

Data Drift

The distribution of input features changes over time.

Detection Methods

Statistical Tests:
- KS test for distribution changes
- PSI (Population Stability Index)
- Chi-squared test for categorical features
Performance Monitoring:
- Track prediction accuracy over time
- Monitor error rates
Data Quality Checks:
- Missing value rates
- Feature distribution shifts

PSI (Population Stability Index)

PSI = \sum_{i=1}^{B} (p_i - q_i) \ln\frac{p_i}{q_i}

where $p_i$ is the proportion in bin $i$ for current data, $q_i$ for reference data.

Interpretation:

PSI < 0.1: No significant change
0.1 < PSI < 0.25: Moderate change
PSI > 0.25: Significant change (investigate)

MLOps Pipeline

Data Pipeline: Ingestion, validation, preprocessing
Training Pipeline: Model training, evaluation, versioning
Deployment Pipeline: Model serving, A/B testing
Monitoring Pipeline: Drift detection, alerting

Code Implementation

Real-World Applications

Google: Search Ranking Deployment

Model Serving: TensorFlow Serving at scale
A/B Testing: Continuous model improvement
Drift Detection: Monitoring search quality

Netflix: Recommendation Deployment

Real-time Serving: Sub-100ms latency requirements
Champion-Challenger: Always testing new models
Personalization: Per-user model selection

💡

Google Interview Tip: Be prepared to discuss tradeoffs between model complexity and serving latency. Mention techniques like model distillation for production deployment.

Common Follow-Up Questions

Q1: How do you handle model versioning in production? Use a model registry (MLflow, SageMaker) to track versions, metadata, and lineage. Always keep previous versions for rollback.

Q2: What is shadow deployment? Run the new model alongside the old one, but only use the old model's predictions. Compare performance without affecting users.

Q3: How do you handle cold start problems? Use default models, content-based filtering, or popularity-based recommendations until enough user data is collected.

Q4: What is the difference between online and batch learning? Batch learning retrains on historical data periodically. Online learning updates incrementally with new data.

Model Deployment: A/B Testing, Model Serving & Drift Detection

Model Deployment: A/B Testing, Model Serving & Drift Detection

Interview Question

Theoretical Foundation

Model Deployment Challenges

Model Serving Architectures

Batch Serving

Real-time Serving

Streaming Serving

Model Serving Optimization

A/B Testing for ML

Model Drift

Concept Drift

Data Drift

Detection Methods

PSI (Population Stability Index)

MLOps Pipeline

Code Implementation

Real-World Applications

Google: Search Ranking Deployment

Netflix: Recommendation Deployment

Common Follow-Up Questions

Related Topics