Panel Data Analysis — Fixed and Random Effects
Statistics
Leveraging Cross-Sectional and Time-Series Dimensions
Panel data follows the same units over time, enabling control for unobserved heterogeneity. Fixed effects eliminate time-invariant confounders, while random effects exploit efficiency gains when assumptions hold.
-
Labor Economics — Estimate wage growth effects while controlling for individual ability
-
Public Policy — Evaluate policy changes using within-state variation over time
-
Finance — Analyze firm performance across years with entity fixed effects
The Hausman test decides: absorb individual differences or borrow strength across groups.
Panel data combines cross-sectional and time-series dimensions — the same units are observed over multiple time periods. This structure allows controlling for unobserved heterogeneity.
DfPanel Data
Data that follow the same individuals, firms, countries, or other units over time. Also called longitudinal data.
Panel Data Structure
| Entity | Time 1 | Time 2 | Time 3 |
|--------|--------|--------|--------|
| Unit 1 | | | |
| Unit 2 | | | |
| Unit 3 | | | |
Notation: — outcome for unit at time
The Pooled OLS Problem
Omitted Variable Bias
If unobserved factors (e.g., ability, culture) are correlated with both the dependent and independent variables, pooled OLS gives biased estimates. Panel data methods address this.
Fixed Effects (FE) Model
Fixed Effects Model
Here,
- =Entity-specific intercept (fixed effect)
- =Common slope across entities
- =Time-varying covariate
- =Idiosyncratic error
How FE Works
Fixed effects are estimated by demeaning — subtracting each entity's mean from all its observations:
This eliminates all time-invariant unobserved heterogeneity ().
Random Effects (RE) Model
Random Effects Model
Here,
- =Random effect: $\alpha_i \sim N(0, \sigma_\alpha^2)$
- =Idiosyncratic error: $\varepsilon_{it} \sim N(0, \sigma_\varepsilon^2)$
RE Assumption
RE assumes the entity-specific effects are uncorrelated with the regressors . If this assumption fails, RE estimates are biased.
FE vs RE Comparison
| Feature | Fixed Effects | Random Effects |
|---------|--------------|----------------|
| Assumption | correlated with | uncorrelated with |
| Time-invariant variables | Cannot estimate | Can estimate |
| Efficiency | Less efficient | More efficient |
| Consistency | Consistent even if correlated | Consistent only if uncorrelated |
| Estimation | Demeaning / LSDV | GLS |
Hausman Test
The Hausman test compares FE and RE to determine which is appropriate.
Hausman Test
Here,
- =Test statistic (asymptotically $\chi^2_k$)
- =Fixed effects estimates
- =Random effects estimates
| Decision | Interpretation |
|---------|---------------|
| Reject | Use Fixed Effects (correlation exists) |
| Fail to reject | Use Random Effects (more efficient) |
First Differences Alternative
First Difference Estimator
Here,
- =First difference operator: $\Delta Y_{it} = Y_{it} - Y_{i,t-1}$
Differencing eliminates the entity-specific effect, just like demeaning.
Time Fixed Effects
Controls for factors that change over time but are constant across entities (e.g., economic shocks, policy changes).
Two-Way Fixed Effects
Here,
- =Entity fixed effects
- =Time fixed effects
Python Implementation
import numpy as np
import pandas as pd
import statsmodels.api as sm
from linearmodels.panel import PanelOLS, RandomEffects
from linearmodels.panel import compare
import matplotlib.pyplot as plt
np.random.seed(42)
# Simulate panel data
n_entities = 100
n_periods = 10
n = n_entities * n_periods
entity_id = np.repeat(np.arange(n_entities), n_periods)
time_id = np.tile(np.arange(n_periods), n_entities)
# Entity effects
alpha = np.random.randn(n_entities) * 2
alpha_panel = alpha[entity_id]
# Covariates
X = np.random.randn(n)
Y = 5 + alpha_panel + 0.8 * X + np.random.randn(n) * 1.5
df = pd.DataFrame({
'Y': Y, 'X': X,
'entity': entity_id,
'time': time_id
}).set_index(['entity', 'time'])
# Fixed Effects
fe_model = PanelOLS.from_formula('Y ~ 1 + X', data=df, entity_effects=True)
fe_result = fe_model.fit()
print("Fixed Effects:")
print(fe_result.summary.tables[1])
# Random Effects
re_model = RandomEffects.from_formula('Y ~ 1 + X', data=df)
re_result = re_model.fit()
print("\nRandom Effects:")
print(re_result.summary.tables[1])
# Compare
print("\nModel Comparison:")
print(compare({'FE': fe_result, 'RE': re_result}))
Worked Example
Example: Wage Determinants
Panel data: 500 workers over 5 years, examining effects of education and experience on wages.
| Model | Education Coef | Experience Coef | R² (within) |
|-------|---------------|----------------|-------------|
| Pooled OLS | 2.85*** | 0.42*** | 0.28 |
| Fixed Effects | 2.12*** | 0.38*** | 0.15 |
| Random Effects | 2.65*** | 0.40*** | 0.32 |
Hausman test: , p < 0.001 -> Use Fixed Effects
The FE estimate of education (2.12) is smaller than pooled OLS (2.85), suggesting positive omitted variable bias (e.g., ability correlated with both education and wages).
Key Takeaways
Summary: Panel Data Analysis
-
Panel data tracks the same units over time, enabling control for unobserved heterogeneity
-
Fixed Effects eliminate entity-specific intercepts through demeaning
-
Random Effects assume entity effects are uncorrelated with regressors
-
Hausman test determines whether FE or RE is more appropriate
-
Time fixed effects control for temporal shocks constant across entities
-
FE cannot estimate time-invariant covariates (e.g., gender, race)
-
Always check for serial correlation and heteroscedasticity in panel data
Related Topics
-
See Multilevel Modeling for hierarchical structures
-
See Difference-in-Differences for policy evaluation
-
See Instrumental Variables for endogeneity issues