ML Engineering
Feature Stores — Managing Features for ML at Scale
Learn how feature stores provide a centralized repository for feature engineering, management, and serving. Essential for production ML systems.
- Feature Engineering — Creating and transforming raw data into features
- Feature Serving — Providing consistent features for training and inference
- Feature Monitoring — Tracking feature drift and quality over time
"Good features are the foundation of good models."
Feature Stores — Complete Guide
Feature stores are centralized repositories for ML features, ensuring consistency between training and serving.
Feature Store Architecture
Offline vs Online Store
DfOffline Store
Stores historical features in data lakes/warehouses. Provides high-throughput access for training jobs. Higher latency is acceptable.
DfOnline Store
Stores current features in low-latency databases (Redis, DynamoDB). Provides sub-millisecond access for real-time serving.
Point-in-Time Correctness
A critical property of feature stores: when creating training data, features are joined at the exact timestamp of each label. This prevents data leakage (using future information to predict the past).
Without point-in-time correctness, models can appear to perform well in training but fail in production because they used information that wouldn't be available at inference time.
Feature Engineering Pipeline
Feast: Open-Source Feature Store
Feast Architecture
Core Concepts:
- Feature View: Defines a set of features from a data source
- Entity: The key used to look up features (e.g., user_id, product_id)
- Feature Service: API endpoint for serving features
- Data Source: Where features come from (parquet, BigQuery, etc.)
Workflow:
- Define features in YAML or Python SDK
- Apply to Feast:
feast apply - Materialize features to online store:
feast materialize - Retrieve features:
store.get_historical_features()orstore.get_online_features()
Key Features:
- Point-in-time correct joins
- Feature versioning
- Hybrid batch + streaming
- Kubernetes-native deployment
Key Takeaways
Summary: Feature Stores
- Feature stores ensure training-serving consistency (same features everywhere)
- Offline store for batch features: training, backfilling (S3, BigQuery)
- Online store for real-time features: serving with sub-ms latency (Redis, DynamoDB)
- Point-in-time correctness prevents data leakage in training
- Feast is the leading open-source feature store
- Feature stores enable feature reuse across teams and models
- Feature pipelines (batch + streaming) compute and update features
- Feature stores reduce time to production from months to days
What to Learn Next
-> Feature Engineering — Complete Guide Learn about feature engineering — complete guide.
-> MLOps — Machine Learning Operations Complete Guide Learn about mlops — machine learning operations complete guide.
-> ML System Design — Architecture and Production Patterns Learn about ml system design — architecture and production patterns.
-> Model Deployment — APIs, Containers and Production ML Learn about model deployment — apis, containers and production ml.
-> Model Evaluation — Metrics, Cross-Validation and Selection Learn about model evaluation — metrics, cross-validation and selection.
-> AutoML — Automated Machine Learning Learn about automl — automated machine learning.