Elastic Net Regression

Regression Analysis

Best of Both Worlds: Ridge and Lasso Combined

Elastic Net blends L1 and L2 penalties, capturing Ridge's stability with Lasso's feature selection. It handles correlated predictors better than Lasso alone while maintaining sparsity, making it ideal for high-dimensional datasets.

Genomics — Select groups of correlated genes together rather than arbitrarily picking one
Marketing Analytics — Handle multicollinearity among campaign variables while selecting key drivers
Healthcare — Build predictive models from hundreds of correlated patient measurements

The l1_ratio parameter lets you slide between Ridge's shrinkage and Lasso's selection.

Elastic Net combines Ridge (L2) and Lasso (L1) penalties:

Elastic Net Objective

\text{Minimize: } \|\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\|^2 + \lambda\left[\frac{1-\alpha}{2}\|\boldsymbol{\beta}\|^2 + \alpha\|\boldsymbol{\beta}\|_1\right]

Here,

$\alpha$ =Mixing parameter (0 = Ridge, 1 = Lasso)
$\lambda$ =Overall regularization strength
$\frac{1-\alpha}{2}\|\boldsymbol{\beta}\|^2$ =L2 (Ridge) component
$\alpha\|\boldsymbol{\beta}\|_1$ =L1 (Lasso) component

a = 0: pure Ridge; a = 1: pure Lasso; 0 < a < 1: Elastic Net


import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import ElasticNet, ElasticNetCV, Ridge, Lasso

from sklearn.preprocessing import StandardScaler

from sklearn.pipeline import Pipeline

from sklearn.model_selection import train_test_split



np.random.seed(42)

n, p = 200, 50

# Create groups of correlated features

X = np.random.randn(n, p)

for group_start in range(0, 15, 5):  # 3 groups of 5 correlated features

    for j in range(group_start+1, group_start+5):

        X[:, j] = X[:, group_start] + np.random.randn(n)*0.3



true_beta = np.zeros(p)

true_beta[:15] = [3,-2,2,-1.5,1, 2.5,-2,1.5,-1,0.8, -3,2,-2,1,-1.5]

y = X @ true_beta + np.random.randn(n)*2



X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)



# Compare Ridge, Lasso, ElasticNet

results = {}

for name, Model, kwargs in [

    ('Ridge', Ridge, {'alpha': 1.0}),

    ('Lasso', Lasso, {'alpha': 0.1, 'max_iter': 10000}),

    ('ElasticNet', ElasticNet, {'alpha': 0.1, 'l1_ratio': 0.5, 'max_iter': 10000})

]:

    model = Pipeline([('s', StandardScaler()), ('m', Model(**kwargs))])

    model.fit(X_train, y_train)

    test_mse = np.mean((y_test - model.predict(X_test))**2)

    nonzero = (model.named_steps['m'].coef_ != 0).sum() if name != 'Ridge' else p

    results[name] = {'mse': test_mse, 'nonzero': nonzero,

                     'coef': model.named_steps['m'].coef_}

    print(f"{name}: Test MSE={test_mse:.4f}, Nonzero={nonzero}/{p}")



# ElasticNetCV to find best alpha and l1_ratio

enet_cv = Pipeline([

    ('s', StandardScaler()),

    ('m', ElasticNetCV(l1_ratio=[0.1, 0.3, 0.5, 0.7, 0.9, 0.95, 1.0],

                       cv=5, random_state=42, max_iter=10000))

])

enet_cv.fit(X_train, y_train)

best_alpha = enet_cv.named_steps['m'].alpha_

best_l1 = enet_cv.named_steps['m'].l1_ratio_

test_mse_cv = np.mean((y_test - enet_cv.predict(X_test))**2)

print(f"\nElasticNetCV: best a={best_alpha:.4f}, l1_ratio={best_l1:.2f}, Test MSE={test_mse_cv:.4f}")



# Coefficient comparison

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for ax, (name, res) in zip(axes, results.items()):

    colors = ['red' if j<15 else 'steelblue' for j in range(p)]

    ax.bar(range(p), res['coef'], color=colors, alpha=0.7)

    ax.axhline(0, color='black', linewidth=0.5)

    ax.set_title(f'{name} (MSE={res["mse"]:.3f})\n{res["nonzero"]} nonzero coefficients')

    ax.set_xlabel('Feature Index')



plt.suptitle('Ridge vs Lasso vs Elastic Net Coefficients', fontsize=13)

plt.tight_layout()

plt.savefig('elastic_net.png', dpi=150)

plt.show()

When to Use Elastic Net

Elastic Net is best when predictors are grouped in correlated clusters. Lasso arbitrarily selects one from a correlated group, while Elastic Net selects all or none.

Key Takeaways

Summary: Elastic Net

Elastic Net is best when predictors are grouped in correlated clusters
Lasso arbitrarily selects one from a correlated group; Elastic Net selects all or none
l1_ratio closer to 1: more like Lasso (sparser); closer to 0: more like Ridge
ElasticNetCV selects both ? and l1_ratio via cross-validation
Default for many ML pipelines: Elastic Net with cross-validated hyperparameters

Elastic Net — Combining Ridge and Lasso

Elastic Net Regression

Best of Both Worlds: Ridge and Lasso Combined

Elastic Net Objective

Key Takeaways

Summary: Elastic Net

Premium Content

Need Expert Statistics Help?