Model Deployment: A/B Testing, Model Serving & Drift Detection
From notebook to production: the deployment lifecycle
Interview Question
"Explain the challenges of deploying ML models in production. How do you set up A/B testing for model comparison? What is model drift and how do you detect it?"
Difficulty: Hard | Frequently asked at Google, Netflix, Amazon
Theoretical Foundation
Model Deployment Challenges
- Scalability: Serving millions of requests per second
- Latency: Real-time predictions in milliseconds
- Reliability: 99.99% uptime requirements
- Monitoring: Detecting model degradation
- Versioning: Managing multiple model versions
- Security: Protecting against adversarial attacks
Model Serving Architectures
Batch Serving
- Pre-compute predictions offline
- Store in database/cache
- Serve via lookup
- Use case: Recommendations, daily reports
Real-time Serving
- Compute predictions on-demand
- Requires low-latency inference
- Use case: Fraud detection, search ranking
Streaming Serving
- Process data streams in real-time
- Update predictions incrementally
- Use case: IoT, real-time monitoring
Model Serving Optimization
- Model Compression: Pruning, quantization, knowledge distillation
- Caching: Cache frequent predictions
- Batching: Process multiple requests together
- Hardware Acceleration: GPU, TPU, FPGA
- Edge Deployment: Deploy to edge devices
A/B Testing for ML
Setup:
- Split traffic between model variants
- Random assignment ensures unbiased comparison
- Statistical significance testing
Key Metrics:
- Online metrics: CTR, conversion rate, revenue
- Model metrics: Accuracy, latency, throughput
- Business metrics: ROI, customer satisfaction
Statistical Testing:
- t-test: Compare means of two groups
- Chi-squared test: Compare proportions
- Bayesian testing: Probabilistic comparison
βΉοΈ
Key Insight: A/B testing is crucial because offline metrics don't always correlate with online performance. A model with higher accuracy might perform worse due to latency or user experience factors.
Model Drift
Concept Drift
The relationship between features and target changes over time.
Data Drift
The distribution of input features changes over time.
Detection Methods
-
Statistical Tests:
- KS test for distribution changes
- PSI (Population Stability Index)
- Chi-squared test for categorical features
-
Performance Monitoring:
- Track prediction accuracy over time
- Monitor error rates
-
Data Quality Checks:
- Missing value rates
- Feature distribution shifts
PSI (Population Stability Index)
where is the proportion in bin for current data, for reference data.
Interpretation:
- PSI < 0.1: No significant change
- 0.1 < PSI < 0.25: Moderate change
- PSI > 0.25: Significant change (investigate)
MLOps Pipeline
- Data Pipeline: Ingestion, validation, preprocessing
- Training Pipeline: Model training, evaluation, versioning
- Deployment Pipeline: Model serving, A/B testing
- Monitoring Pipeline: Drift detection, alerting
Code Implementation
Real-World Applications
Google: Search Ranking Deployment
- Model Serving: TensorFlow Serving at scale
- A/B Testing: Continuous model improvement
- Drift Detection: Monitoring search quality
Netflix: Recommendation Deployment
- Real-time Serving: Sub-100ms latency requirements
- Champion-Challenger: Always testing new models
- Personalization: Per-user model selection
π‘
Google Interview Tip: Be prepared to discuss tradeoffs between model complexity and serving latency. Mention techniques like model distillation for production deployment.
Common Follow-Up Questions
Q1: How do you handle model versioning in production? Use a model registry (MLflow, SageMaker) to track versions, metadata, and lineage. Always keep previous versions for rollback.
Q2: What is shadow deployment? Run the new model alongside the old one, but only use the old model's predictions. Compare performance without affecting users.
Q3: How do you handle cold start problems? Use default models, content-based filtering, or popularity-based recommendations until enough user data is collected.
Q4: What is the difference between online and batch learning? Batch learning retrains on historical data periodically. Online learning updates incrementally with new data.