🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Regularization — Ridge, Lasso and Elastic Net Complete Guide

Core MLRegularization🟢 Free Lesson

Advertisement

ML Foundations

Preventing Overfitting — Ridge, Lasso, and Elastic Net

Regularization constrains model complexity by adding penalty terms to the loss function, helping models generalize better to unseen data.

  • Ridge (L2) — shrinks weights toward zero to prevent overfitting when all features are potentially useful
  • Lasso (L1) — zeros out irrelevant features, performing automatic feature selection
  • Elastic Net — combines both penalties for the best of Ridge and Lasso

"Simplicity is the ultimate sophistication." — Leonardo da Vinci

Regularization — Complete Guide

Regularization prevents overfitting by adding a penalty term to the loss function, constraining model complexity.


The Problem

DfOverfitting

Overfitting occurs when a model learns the training data too well, including noise and random fluctuations, resulting in poor performance on new, unseen data.

L1 vs L2 Constraint Regions

L1 (Lasso) vs L2 (Ridge) Constraint GeometryL1 Constraint (Lasso)w₁w₂SolutionDiamond → sparse solutions (corners)L2 Constraint (Ridge)w₁w₂SolutionCircle → small but non-zero weights
Architecture Diagram
Without regularization:
  Model fits training data perfectly
  Complex models with large weights
  High variance (overfitting)
  Poor generalization to new data

With regularization:
  Model balances fit and simplicity
  Smaller weights
  Lower variance (less overfitting)
  Better generalization

Ridge Regression (L2)

DfRidge Regression (L2 Regularization)

Adds the squared magnitude of weights as penalty to the loss function. Shrinks weights toward zero but never exactly to zero.

Ridge Loss

LRidge=MSE+αi=1nwi2L_{\text{Ridge}} = MSE + \alpha \sum_{i=1}^{n} w_i^2

Here,

  • LRidgeL_{\text{Ridge}}=Ridge loss
  • MSEMSE=Mean Squared Error
  • α\alpha=Regularization strength
  • wiw_i=Model weights

Coefficient Shrinkage Diagram

Coefficient Shrinkage: Ridge vs LassoRidge (L2) — Shrinks toward zeroAll weights small, none exactly zeroLasso (L1) — Some weights = 0Feature selection: sparse solution

When to Use Ridge

Use Ridge when you have many features that are all potentially useful. It prevents overfitting by shrinking all weights toward zero.


Lasso Regression (L1)

DfLasso Regression (L1 Regularization)

Adds the absolute magnitude of weights as penalty. Can shrink weights to exactly zero, performing automatic feature selection.

Lasso Loss

LLasso=MSE+αi=1nwiL_{\text{Lasso}} = MSE + \alpha \sum_{i=1}^{n} |w_i|

Here,

  • LLassoL_{\text{Lasso}}=Lasso loss
  • MSEMSE=Mean Squared Error
  • α\alpha=Regularization strength
  • wiw_i=Model weights

Feature Selection

Lasso can shrink some weights to exactly zero, effectively selecting a subset of features. This makes it useful for feature selection.


Elastic Net

DfElastic Net

Combines both L1 (Lasso) and L2 (Ridge) penalties. Provides a balance between feature selection and weight shrinkage.

Elastic Net Loss

LElastic Net=MSE+α1i=1nwi+α2i=1nwi2L_{\text{Elastic Net}} = MSE + \alpha_1 \sum_{i=1}^{n} |w_i| + \alpha_2 \sum_{i=1}^{n} w_i^2

Here,

  • LElastic NetL_{\text{Elastic Net}}=Elastic Net loss
  • MSEMSE=Mean Squared Error
  • α1\alpha_1=L1 regularization strength
  • α2\alpha_2=L2 regularization strength
  • wiw_i=Model weights

Regularization Path

Regularization Path — Coefficients vs αlog(α) →Coefficient value0w₁w₂w₃wâ‚„w₄Lasso: w₂, w₃ → 0As α increases, more coefficients → 0

When to Use Elastic Net

Use Elastic Net when you have correlated features and need feature selection. It combines the benefits of both Ridge and Lasso.


Python Implementation

Python Implementation

from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.model_selection import cross_val_score
import numpy as np

# Ridge
ridge = Ridge(alpha=1.0)
scores = cross_val_score(ridge, X, y, cv=5, scoring='r2')

# Lasso
lasso = Lasso(alpha=0.1)
scores = cross_val_score(lasso, X, y, cv=5, scoring='r2')
print(f"Lasso selected {np.sum(lasso.coef_ != 0)} features")

# Elastic Net
enet = ElasticNet(alpha=0.1, l1_ratio=0.5)
scores = cross_val_score(enet, X, y, cv=5, scoring='r2')

Choosing Alpha

Choosing Alpha

α=0\alpha = 0

: No regularization (original model)

α=\alpha = \infty

: All weights = 0 (trivial model)

Use cross-validation to find optimal

α\alpha

:

Architecture Diagram
α = 0: No regularization (original model)
α = ≡: All weights = 0 (trivial model)

Use cross-validation to find optimal α:
alphas = [0.001, 0.01, 0.1, 1, 10, 100]
for a in alphas:
    model = Ridge(alpha=a)
    score = cross_val_score(model, X, y, cv=5).mean()
    print(f"α={a}: {score:.3f}")

Key Takeaways

Summary: Regularization

  1. Regularization prevents overfitting by penalizing complexity
  2. Ridge (L2) shrinks weights — good when all features matter
  3. Lasso (L1) performs feature selection — zeros out irrelevant features
  4. Elastic Net combines both — good default choice
  5. Cross-validation is essential for choosing alpha
  6. Scale features before regularization (penalty is scale-dependent)
  7. Regularization is crucial for high-dimensional data
  8. Tree-based models don't need regularization

What to Learn Next

-> Linear Regression Understand the foundational model where Ridge and Lasso regularization are applied.

-> Logistic Regression Extend regularization to classification problems with penalized logistic models.

-> Model Evaluation Learn cross-validation techniques for selecting the optimal regularization strength.

-> Model Selection Compare algorithms and tune hyperparameters including regularization parameters.

-> Training Deep Networks Apply dropout, weight decay, and batch normalization as regularization in deep learning.

-> SVM Explore maximum margin classifiers that implicitly use L2 regularization.

Premium Content

Regularization — Ridge, Lasso and Elastic Net Complete Guide

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement