🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Multicollinearity — Detection and Solutions

Regression AnalysisMultiple Regression🟢 Free Lesson

Advertisement

Multicollinearity

Regression Analysis

When Predictors Correlate and Coefficients Become Unreliable

Multicollinearity inflates standard errors, making individual predictors appear insignificant even when the overall model is strong. Detection through VIF and condition numbers is essential before interpreting coefficients.

  • Economics — Disentangle effects of correlated macroeconomic indicators
  • Genomics — Handle highly correlated gene expression variables
  • Policy Analysis — Isolate individual policy impacts when interventions are bundled

High VIF signals that the model cannot distinguish one predictor's effect from another's.


Multicollinearity occurs when two or more predictors are highly correlated with each other. It doesn't bias OLS estimates but inflates standard errors, making individual coefficients unreliable.

DfMulticollinearity

A condition in regression where two or more predictor variables are highly correlated, leading to unstable coefficient estimates and inflated standard errors.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
import matplotlib.pyplot as plt
import seaborn as sns

np.random.seed(42)
n = 100

# Create correlated predictors
z = np.random.normal(0, 1, n)
x1 = z + np.random.normal(0, 0.3, n)      # strongly correlated with z
x2 = z + np.random.normal(0, 0.3, n)      # also correlated with z
x3 = np.random.normal(0, 1, n)            # independent
y = 2*x1 + 1.5*x3 + np.random.normal(0, 1, n)

X = sm.add_constant(pd.DataFrame({'x1':x1,'x2':x2,'x3':x3}))

# Detect multicollinearity: Variance Inflation Factor
vif_data = pd.DataFrame()
vif_data['Feature'] = ['x1','x2','x3']
vif_data['VIF'] = [variance_inflation_factor(X.values, i+1) for i in range(3)]
print("VIF (Variance Inflation Factor):")
print(vif_data)
print("Rule of thumb: VIF > 10 (or >5) indicates problematic multicollinearity")

# Correlation matrix
corr = pd.DataFrame({'x1':x1,'x2':x2,'x3':x3}).corr()
print("\nCorrelation matrix:")
print(corr.round(3))

plt.figure(figsize=(6, 4))
sns.heatmap(corr, annot=True, fmt='.3f', cmap='RdBu_r', center=0)
plt.title('Predictor Correlation Matrix')
plt.tight_layout()
plt.savefig('multicollinearity.png', dpi=150)
plt.show()

# Show effect: unstable coefficients with multicollinearity
print("\nWith multicollinearity — coefficient instability:")
for seed in [1, 2, 3, 4, 5]:
    np.random.seed(seed)
    x1s = z + np.random.normal(0, 0.3, n)
    x2s = z + np.random.normal(0, 0.3, n)
    ys = 2*x1s + 1.5*np.random.normal(0,1,n) + np.random.normal(0, 1, n)
    Xs = sm.add_constant(pd.DataFrame({'x1':x1s,'x2':x2s}))
    m = sm.OLS(ys, Xs).fit()
    print(f"  Seed {seed}: β₁={m.params['x1']:.3f}, β₂={m.params['x2']:.3f}")

Solutions

SolutionWhen to Use
Remove one collinear predictorIf redundant (e.g., two versions of same variable)
Create composite (PCA)When both carry signal
Ridge regressionRegularization shrinks correlated coefficients
Center/standardize variablesFor polynomial terms and interactions
Collect more dataIncreases precision

VIF Thresholds

VIF greater than 10 suggests serious multicollinearity. VIF greater than 5 warrants attention, but context matters — in some fields, higher VIF values may be acceptable.


Key Takeaways

Summary: Multicollinearity

  • VIF greater than 10 suggests serious multicollinearity; VIF greater than 5 warrants attention
  • Multicollinearity inflates standard errors -> wide CIs, large p-values, unstable coefficients
  • Point estimates are still unbiased — only inference is affected
  • Perfect collinearity makes XᵀX non-invertible -> OLS impossible
  • Ridge regression is the best solution when you need all predictors

Premium Content

Multicollinearity — Detection and Solutions

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement