Heteroscedasticity

Regression Analysis

When Error Variance Isn't Constant

Heteroscedasticity violates a core OLS assumption, biasing standard errors and invalidating hypothesis tests. Detection through Breusch-Pagan and White tests, along with robust standard errors, ensures reliable inference.

Income Analysis — Variance in spending increases with income levels
Healthcare — Treatment effect variability differs across patient subgroups
Finance — Volatility clustering in stock returns creates non-constant error variance

When errors grow louder with the signal, robust methods keep inference on track.

Heteroscedasticity means the variance of the error term is not constant across observations. It violates the homoscedasticity assumption of OLS.

DfHeteroscedasticity

A violation of the homoscedasticity assumption where the variance of the error term differs across observations, often appearing as a fan or trumpet shape in residual plots.

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
from statsmodels.stats.diagnostic import het_breuschpagan, het_white
from statsmodels.stats.sandwich_covariance import cov_hc3

np.random.seed(42)
n = 200
X = np.random.uniform(1, 10, n)
X_dm = sm.add_constant(X)

# Heteroscedastic errors: variance grows with X
y_hetero = 2 + 3*X + np.random.normal(0, 0.5*X, n)
# Homoscedastic errors (for comparison)
y_homo = 2 + 3*X + np.random.normal(0, 2, n)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
for ax, y, label in [(axes[0], y_homo, 'Homoscedastic'),
                     (axes[1], y_hetero, 'Heteroscedastic')]:
    model = sm.OLS(y, X_dm).fit()
    ax.scatter(model.fittedvalues, model.resid, alpha=0.5)
    ax.axhline(0, color='red', linestyle='--')
    ax.set_title(f'{label} — Residuals vs Fitted')
    ax.set_xlabel('Fitted Values')
    ax.set_ylabel('Residuals')
plt.tight_layout()
plt.savefig('heteroscedasticity.png', dpi=150)
plt.show()

# Detection tests
model_h = sm.OLS(y_hetero, X_dm).fit()
bp_stat, bp_p, _, _ = het_breuschpagan(model_h.resid, model_h.model.exog)
wh_stat, wh_p, _, _ = het_white(model_h.resid, model_h.model.exog)
print(f"Breusch-Pagan test: χ²={bp_stat:.4f}, p={bp_p:.6f}")
print(f"White's test: χ²={wh_stat:.4f}, p={wh_p:.6f}")

# Solution 1: Robust standard errors (HC3)
model_robust = sm.OLS(y_hetero, X_dm).fit(cov_type='HC3')
print("\nOLS with HC3 robust standard errors:")
print(model_robust.summary().tables[1])

# Solution 2: Log transformation (if Y is always positive)
y_log = np.log(np.abs(y_hetero) + 1)
model_log = sm.OLS(y_log, X_dm).fit()
bp_log, p_log, _, _ = het_breuschpagan(model_log.resid, model_log.model.exog)
print(f"\nAfter log(Y) transform: BP p-value = {p_log:.4f}")

Primary Diagnostic

The residuals vs fitted plot is the primary visual diagnostic for heteroscedasticity. Look for a fan or trumpet shape — the spread of residuals should be constant across fitted values.

Key Takeaways

Summary: Heteroscedasticity

Heteroscedasticity biases standard errors but not point estimates
Residuals vs fitted plot is the primary visual diagnostic (fan/trumpet shape)
Breusch-Pagan tests linear heteroscedasticity; White's is more general
Robust SEs (HC3) are the easiest fix — valid even under heteroscedasticity
Log transformation often cures heteroscedasticity for strictly positive variables

Heteroscedasticity — Detection, Consequences, Solutions

Heteroscedasticity

When Error Variance Isn't Constant

DfHeteroscedasticity

Key Takeaways

Summary: Heteroscedasticity

Premium Content

Need Expert Statistics Help?