Autocorrelation in Regression

Regression Analysis

When Residuals Are Not Independent

Autocorrelation in time-series regression inflates t-statistics and shrinks standard errors, leading to false confidence in predictors. Durbin-Watson tests and Newey-West corrections address this pervasive problem.

Macroeconomics — Model GDP growth where errors correlate across quarters
Energy Markets — Forecast demand where daily patterns create serial dependence
Environmental Studies — Analyze pollution trends with temporally correlated residuals

Time-ordered data demands attention to the correlation structure hidden in residuals.

Autocorrelation means the residuals are correlated across time (or space). It violates the independence assumption and biases standard errors.

DfAutocorrelation

A correlation between observations of the same variable at different time points (or spatial locations), violating the independence assumption of OLS.

import numpy as np
import statsmodels.api as sm
from statsmodels.stats.stattools import durbin_watson
from statsmodels.graphics.tsaplots import plot_acf
import matplotlib.pyplot as plt

np.random.seed(42)
n = 100
t = np.arange(n)

# Simulate time series regression with autocorrelated errors (AR(1))
X = t/10 + np.random.normal(0, 0.5, n)
rho = 0.7  # autocorrelation coefficient
epsilon = np.zeros(n)
epsilon[0] = np.random.normal(0, 1)
for i in range(1, n):
    epsilon[i] = rho * epsilon[i-1] + np.random.normal(0, 1)

y = 2 + 0.5*X + epsilon

X_dm = sm.add_constant(X)
model = sm.OLS(y, X_dm).fit()

# Durbin-Watson test
dw = durbin_watson(model.resid)
print(f"Durbin-Watson statistic = {dw:.4f}")
print(f"Interpretation: {'No autocorrelation' if 1.5<dw<2.5 else 'Positive autocorrelation' if dw<1.5 else 'Negative autocorrelation'}")

# ACF plot of residuals
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].scatter(t, model.resid, alpha=0.6)
axes[0].axhline(0, color='red', linestyle='--')
axes[0].set_title('Residuals over Time')
plot_acf(model.resid, ax=axes[1], lags=20, alpha=0.05)
axes[1].set_title('ACF of Residuals (significant lags = autocorrelation)')
plt.tight_layout()
plt.savefig('autocorrelation.png', dpi=150)
plt.show()

# Fix: Newey-West robust standard errors (HAC)
model_nw = sm.OLS(y, X_dm).fit(cov_type='HAC', cov_kwds={'maxlags':5})
print("\nNewey-West HAC standard errors:")
print(f"  β₁ SE (OLS): {model.bse['x1']:.4f}")
print(f"  β₁ SE (HAC): {model_nw.bse['x1']:.4f}")
print(f"  HAC SE is larger — reflects true uncertainty with autocorrelation")

Durbin-Watson Interpretation

DW ≈ 2: no autocorrelation
DW < 1.5: positive autocorrelation
DW greater than 2.5: negative autocorrelation

Key Takeaways

Summary: Autocorrelation in Regression

Durbin-Watson ≈ 2: no autocorrelation; <1.5: positive; greater than 2.5: negative
ACF plot identifies lag structure — which lags are significant?
Consequences: standard errors are biased (usually too small -> inflated t-stats)
Newey-West HAC SEs correct inference without changing point estimates
For strong autocorrelation: use ARIMA or add lagged Y as predictor

Autocorrelation in Regression — Detection and Corrections

Autocorrelation in Regression

When Residuals Are Not Independent

DfAutocorrelation

Key Takeaways

Summary: Autocorrelation in Regression

Premium Content

Need Expert Statistics Help?