Heteroscedasticity
Regression Analysis
When Error Variance Isn't Constant
Heteroscedasticity violates a core OLS assumption, biasing standard errors and invalidating hypothesis tests. Detection through Breusch-Pagan and White tests, along with robust standard errors, ensures reliable inference.
- Income Analysis — Variance in spending increases with income levels
- Healthcare — Treatment effect variability differs across patient subgroups
- Finance — Volatility clustering in stock returns creates non-constant error variance
When errors grow louder with the signal, robust methods keep inference on track.
Heteroscedasticity means the variance of the error term is not constant across observations. It violates the homoscedasticity assumption of OLS.
DfHeteroscedasticity
A violation of the homoscedasticity assumption where the variance of the error term differs across observations, often appearing as a fan or trumpet shape in residual plots.
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
from statsmodels.stats.diagnostic import het_breuschpagan, het_white
from statsmodels.stats.sandwich_covariance import cov_hc3
np.random.seed(42)
n = 200
X = np.random.uniform(1, 10, n)
X_dm = sm.add_constant(X)
# Heteroscedastic errors: variance grows with X
y_hetero = 2 + 3*X + np.random.normal(0, 0.5*X, n)
# Homoscedastic errors (for comparison)
y_homo = 2 + 3*X + np.random.normal(0, 2, n)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
for ax, y, label in [(axes[0], y_homo, 'Homoscedastic'),
(axes[1], y_hetero, 'Heteroscedastic')]:
model = sm.OLS(y, X_dm).fit()
ax.scatter(model.fittedvalues, model.resid, alpha=0.5)
ax.axhline(0, color='red', linestyle='--')
ax.set_title(f'{label} — Residuals vs Fitted')
ax.set_xlabel('Fitted Values')
ax.set_ylabel('Residuals')
plt.tight_layout()
plt.savefig('heteroscedasticity.png', dpi=150)
plt.show()
# Detection tests
model_h = sm.OLS(y_hetero, X_dm).fit()
bp_stat, bp_p, _, _ = het_breuschpagan(model_h.resid, model_h.model.exog)
wh_stat, wh_p, _, _ = het_white(model_h.resid, model_h.model.exog)
print(f"Breusch-Pagan test: χ²={bp_stat:.4f}, p={bp_p:.6f}")
print(f"White's test: χ²={wh_stat:.4f}, p={wh_p:.6f}")
# Solution 1: Robust standard errors (HC3)
model_robust = sm.OLS(y_hetero, X_dm).fit(cov_type='HC3')
print("\nOLS with HC3 robust standard errors:")
print(model_robust.summary().tables[1])
# Solution 2: Log transformation (if Y is always positive)
y_log = np.log(np.abs(y_hetero) + 1)
model_log = sm.OLS(y_log, X_dm).fit()
bp_log, p_log, _, _ = het_breuschpagan(model_log.resid, model_log.model.exog)
print(f"\nAfter log(Y) transform: BP p-value = {p_log:.4f}")
Primary Diagnostic
The residuals vs fitted plot is the primary visual diagnostic for heteroscedasticity. Look for a fan or trumpet shape — the spread of residuals should be constant across fitted values.
Key Takeaways
Summary: Heteroscedasticity
- Heteroscedasticity biases standard errors but not point estimates
- Residuals vs fitted plot is the primary visual diagnostic (fan/trumpet shape)
- Breusch-Pagan tests linear heteroscedasticity; White's is more general
- Robust SEs (HC3) are the easiest fix — valid even under heteroscedasticity
- Log transformation often cures heteroscedasticity for strictly positive variables