🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Feature Engineering — Complete Guide

Core MLFeature Engineering🟢 Free Lesson

Advertisement

ML Foundations

Feature Engineering — Where Domain Knowledge Meets Data Science

Feature engineering transforms raw data into representations that dramatically improve model performance. It is often the single most impactful step in any machine learning pipeline.

  • Numerical Scaling — StandardScaler, MinMaxScaler, and RobustScaler prepare features for distance-based models
  • Categorical Encoding — one-hot, label, and target encoding convert categorical data into model-ready formats
  • Feature Creation — interaction terms, date components, and aggregations unlock hidden patterns in your data

"Coming up with features is difficult, time-consuming, requires expert knowledge. Applied machine learning is basically feature engineering." — Andrew Ng

Feature Engineering — Complete Guide

Feature engineering transforms raw data into features that improve model performance. It's often the most impactful step in ML.


Mathematical Foundations

Standardization (Z-score)

z=xμσz = \frac{x - \mu}{\sigma}

where

μ=1ni=1nxi\mu = \frac{1}{n}\sum_{i=1}^{n} x_i

and

σ=1ni=1n(xiμ)2\sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(x_i - \mu)^2}

Min-Max Scaling

xscaled=xmin(x)max(x)min(x)x_{\text{scaled}} = \frac{x - \min(x)}{\max(x) - \min(x)}

Information Gain (Feature Selection)

IG(D,A)=H(D)H(DA)IG(D, A) = H(D) - H(D|A)

where

H(D)=k=1Kpklog2(pk)H(D) = -\sum_{k=1}^{K} p_k \log_2(p_k)

is the entropy.


Feature Engineering Pipeline

Feature Engineering PipelineRaw DataCSV, DB, APICleaningMissing valuesOutliers, DuplicatesScalingStandardScalerMinMax, RobustEncodingOne-Hot, LabelTarget, BinarySelectionFilter, WrapperEmbeddedMLKey Principle: Prevent Data Leakagefit_transform() on TRAIN only → transform() on TESTUse sklearn Pipeline to chain steps safelyNever fit on test data — this leaks future informationPitfall: Computing mean/std on entire dataset before split

Numerical Features

DfStandardScaler (Z-score)

Standardizes features by removing the mean and scaling to unit variance. Results in mean=0, std=1.

Z-score Standardization

z=xμσz = \frac{x - \mu}{\sigma}

Here,

  • zz=Standardized value
  • xx=Original value
  • μ\mu=Mean of feature
  • σ\sigma=Standard deviation of feature

DfMinMaxScaler

Scales features to a fixed range, typically [0, 1], by subtracting the minimum and dividing by the range.

Min-Max Scaling

xscaled=xmin(x)max(x)min(x)x_{\text{scaled}} = \frac{x - \min(x)}{\max(x) - \min(x)}

Here,

  • xscaledx_{\text{scaled}}=Scaled value
  • min(x),max(x)\min(x), \max(x)=Minimum and maximum values

DfRobustScaler

Uses median and interquartile range (IQR) instead of mean and variance. Robust to outliers.

Encoding Methods Diagram

Categorical Encoding Methods ComparisonOne-Hot EncodingColor: [Red, Blue, Green]Red → [1, 0, 0]Blue → [0, 1, 0]Green → [0, 0, 1]Nominal categories, no orderLabel EncodingSize: [S, M, L, XL]S=0, M=1, L=2, XL=3Ordinal categories (has order)Target EncodingCity → mean(target)NYC: 0.73LA: 0.45CHI: 0.62High cardinality featuresBinary EncodingColor: [Red, Blue, Green]Red=[0,0], Blue=[0,1], Green=[1,0]log₂(k) columns, good compromiseFrequency EncodingNYC: 0.4 (40% of data)LA: 0.35Replace category with countEmbeddingNYC → [0.2, -0.5, 0.8]LA → [0.1, 0.3, 0.6]Neural network learnedRule of thumb: Few categories → One-Hot | Many categories → Target/Embedding | Ordinal → Label
Architecture Diagram
RobustScaler:
Uses median and IQR
Robust to outliers
Use for: Data with outliers

Log Transform:
x_log = log(x + 1)
Use for: Skewed distributions, power laws

When to Use Each Scaler

  • StandardScaler: Most algorithms (SVM, KNN, Neural Networks)
  • MinMaxScaler: Neural networks, image data
  • RobustScaler: Data with outliers
  • Log Transform: Skewed distributions, power laws

Feature Creation

Feature Creation StrategiesDate Features• Year, Month, Day• Day of week• Is weekend/holiday• Season, QuarterText Features• Word/char count• TF-IDF vectors• Sentiment scores• Named entitiesInteraction• x₁ × x₂ (product)• x₁ / x₂ (ratio)• x₁ - x₂ (diff)• x₁², x₂² (poly)Aggregation• Mean/Median per group• Count per category• Rolling statistics• Lag featuresMathematical Feature CreationPolynomial: φ(x) = [1, x₁, x₂, x₁², x₁x₂, x₂²]Binning: x' = ⌊x / Δ⌋ (discretize continuous features)Power: x' = x^α (Box-Cox: find optimal α)Log: x' = log(x + 1)Sqrt: x' = √xReciprocal: x' = 1/(x + ε)
Architecture Diagram
Date features:
  Year, Month, Day, Hour
  Day of week, Is weekend
  Is holiday, Season
  Days since event

Text features:
  Word count, Character count
  TF-IDF vectors
  Word embeddings
  Sentiment scores

Interaction features:
  x₁ × x₂ (product)
  x₁ / x₂ (ratio)
  x₁ - x₂ (difference)
  x₁², x₂² (polynomial)

Aggregation features:
  Mean, Median, Std per group
  Count per category
  Rolling statistics
  Lag features

Feature Selection

DfFeature Selection

The process of selecting a subset of relevant features for use in model construction. Reduces overfitting, improves accuracy, and reduces training time.

Feature Selection Methods

Three Approaches to Feature SelectionFilter MethodsStatistical tests (model-free)• Pearson correlation• Chi-squared test• Mutual information• ANOVA F-testFast, ignores feature interactionsUse as preprocessing stepWrapper MethodsModel-based search• Forward selection• Backward elimination• Recursive Feature Elimination• Genetic algorithmsAccounts for interactionsComputationally expensiveEmbedded MethodsBuilt into model training• L1 regularization (Lasso)• Tree feature importance• Permutation importance• SHAP valuesBest balance of speed/qualityModel-specific
Architecture Diagram
Method 1: Filter (statistical tests)
  Correlation with target
  Chi-squared test
  Mutual information
  ANOVA F-test

Method 2: Wrapper (model-based)
  Forward selection
  Backward elimination
  Recursive feature elimination (RFE)
  Genetic algorithms

Method 3: Embedded (built into model)
  L1 regularization (Lasso)
  Feature importance (Tree-based)
  Permutation importance

Python Implementation

Python Implementation

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Define preprocessing
numerical = ['age', 'income', 'score']
categorical = ['gender', 'city', 'category']

preprocessor = ColumnTransformer([
    ('num', StandardScaler(), numerical),
    ('cat', OneHotEncoder(handle_unknown='ignore'), categorical)
])

# Create pipeline
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier())
])

# Train
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
pipeline.fit(X_train, y_train)
print(f"Accuracy: {pipeline.score(X_test, y_test):.3f}")

Key Takeaways

Summary: Feature Engineering

  1. Feature engineering is often more important than model choice
  2. Scale numerical features for distance-based algorithms
  3. One-hot encode categorical variables for most models
  4. Create interaction features to capture relationships
  5. Feature selection removes noise and speeds up training
  6. Use pipelines to prevent data leakage
  7. Domain knowledge guides the best feature engineering
  8. Automated tools (featuretools) can generate features

What to Learn Next

-> Dimensionality Reduction Reduce high-dimensional features using PCA, t-SNE, and UMAP while preserving key information.

-> Model Evaluation Measure how much your engineered features actually improve model performance.

-> Linear Regression See how feature scaling and encoding directly impact linear model accuracy.

-> Clustering Use unsupervised techniques to discover hidden groups and create new features.

-> Model Selection Choose the best algorithm and tune hyperparameters for your engineered features.

-> Model Deployment Package your feature engineering pipeline into production-ready APIs and services.

Premium Content

Feature Engineering — Complete Guide

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement