🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Heteroscedasticity — Detection, Consequences, Solutions

Regression AnalysisDiagnostics🟢 Free Lesson

Advertisement

Heteroscedasticity

Regression Analysis

When Error Variance Isn't Constant

Heteroscedasticity violates a core OLS assumption, biasing standard errors and invalidating hypothesis tests. Detection through Breusch-Pagan and White tests, along with robust standard errors, ensures reliable inference.

  • Income Analysis — Variance in spending increases with income levels
  • Healthcare — Treatment effect variability differs across patient subgroups
  • Finance — Volatility clustering in stock returns creates non-constant error variance

When errors grow louder with the signal, robust methods keep inference on track.


Heteroscedasticity means the variance of the error term is not constant across observations. It violates the homoscedasticity assumption of OLS.

DfHeteroscedasticity

A violation of the homoscedasticity assumption where the variance of the error term differs across observations, often appearing as a fan or trumpet shape in residual plots.

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
from statsmodels.stats.diagnostic import het_breuschpagan, het_white
from statsmodels.stats.sandwich_covariance import cov_hc3

np.random.seed(42)
n = 200
X = np.random.uniform(1, 10, n)
X_dm = sm.add_constant(X)

# Heteroscedastic errors: variance grows with X
y_hetero = 2 + 3*X + np.random.normal(0, 0.5*X, n)
# Homoscedastic errors (for comparison)
y_homo = 2 + 3*X + np.random.normal(0, 2, n)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
for ax, y, label in [(axes[0], y_homo, 'Homoscedastic'),
                     (axes[1], y_hetero, 'Heteroscedastic')]:
    model = sm.OLS(y, X_dm).fit()
    ax.scatter(model.fittedvalues, model.resid, alpha=0.5)
    ax.axhline(0, color='red', linestyle='--')
    ax.set_title(f'{label} — Residuals vs Fitted')
    ax.set_xlabel('Fitted Values')
    ax.set_ylabel('Residuals')
plt.tight_layout()
plt.savefig('heteroscedasticity.png', dpi=150)
plt.show()

# Detection tests
model_h = sm.OLS(y_hetero, X_dm).fit()
bp_stat, bp_p, _, _ = het_breuschpagan(model_h.resid, model_h.model.exog)
wh_stat, wh_p, _, _ = het_white(model_h.resid, model_h.model.exog)
print(f"Breusch-Pagan test: χ²={bp_stat:.4f}, p={bp_p:.6f}")
print(f"White's test: χ²={wh_stat:.4f}, p={wh_p:.6f}")

# Solution 1: Robust standard errors (HC3)
model_robust = sm.OLS(y_hetero, X_dm).fit(cov_type='HC3')
print("\nOLS with HC3 robust standard errors:")
print(model_robust.summary().tables[1])

# Solution 2: Log transformation (if Y is always positive)
y_log = np.log(np.abs(y_hetero) + 1)
model_log = sm.OLS(y_log, X_dm).fit()
bp_log, p_log, _, _ = het_breuschpagan(model_log.resid, model_log.model.exog)
print(f"\nAfter log(Y) transform: BP p-value = {p_log:.4f}")

Primary Diagnostic

The residuals vs fitted plot is the primary visual diagnostic for heteroscedasticity. Look for a fan or trumpet shape — the spread of residuals should be constant across fitted values.


Key Takeaways

Summary: Heteroscedasticity

  • Heteroscedasticity biases standard errors but not point estimates
  • Residuals vs fitted plot is the primary visual diagnostic (fan/trumpet shape)
  • Breusch-Pagan tests linear heteroscedasticity; White's is more general
  • Robust SEs (HC3) are the easiest fix — valid even under heteroscedasticity
  • Log transformation often cures heteroscedasticity for strictly positive variables

Premium Content

Heteroscedasticity — Detection, Consequences, Solutions

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement