Panel Data Analysis — Fixed and Random Effects

Statistics

Leveraging Cross-Sectional and Time-Series Dimensions

Panel data follows the same units over time, enabling control for unobserved heterogeneity. Fixed effects eliminate time-invariant confounders, while random effects exploit efficiency gains when assumptions hold.

Labor Economics — Estimate wage growth effects while controlling for individual ability
Public Policy — Evaluate policy changes using within-state variation over time
Finance — Analyze firm performance across years with entity fixed effects

The Hausman test decides: absorb individual differences or borrow strength across groups.

Panel data combines cross-sectional and time-series dimensions — the same units are observed over multiple time periods. This structure allows controlling for unobserved heterogeneity.

DfPanel Data

Data that follow the same individuals, firms, countries, or other units over time. Also called longitudinal data.

Panel Data Structure

|--------|--------|--------|--------|

| Unit 1 | $Y_{11}$ | $Y_{12}$ | $Y_{13}$ |

| Unit 2 | $Y_{21}$ | $Y_{22}$ | $Y_{23}$ |

| Unit 3 | $Y_{31}$ | $Y_{32}$ | $Y_{33}$ |

Notation: $Y_{it}$ — outcome for unit $i$ at time $t$

The Pooled OLS Problem

Omitted Variable Bias

If unobserved factors (e.g., ability, culture) are correlated with both the dependent and independent variables, pooled OLS gives biased estimates. Panel data methods address this.

Fixed Effects (FE) Model

Fixed Effects Model

Y_{it} = \alpha_i + \beta X_{it} + \varepsilon_{it}

Here,

$\alpha_i$ =Entity-specific intercept (fixed effect)
$\beta$ =Common slope across entities
$X_{it}$ =Time-varying covariate
$\varepsilon_{it}$ =Idiosyncratic error

How FE Works

Fixed effects are estimated by demeaning — subtracting each entity's mean from all its observations:

(Y_{it} - \bar{Y}_i) = \beta(X_{it} - \bar{X}_i) + (\varepsilon_{it} - \bar{\varepsilon}_i)

This eliminates all time-invariant unobserved heterogeneity ( $\alpha_i$ ).

Random Effects (RE) Model

Random Effects Model

Y_{it} = \beta X_{it} + (\alpha_i + \varepsilon_{it})

Here,

$\alpha_i$ =Random effect: $\alpha_i \sim N(0, \sigma_\alpha^2)$
$\varepsilon_{it}$ =Idiosyncratic error: $\varepsilon_{it} \sim N(0, \sigma_\varepsilon^2)$

RE Assumption

RE assumes the entity-specific effects $\alpha_i$ are uncorrelated with the regressors $X_{it}$ . If this assumption fails, RE estimates are biased.

FE vs RE Comparison

| Feature | Fixed Effects | Random Effects |

|---------|--------------|----------------|

| Assumption | $\alpha_i$ correlated with $X$ | $\alpha_i$ uncorrelated with $X$ |

| Time-invariant variables | Cannot estimate | Can estimate |

| Efficiency | Less efficient | More efficient |

| Consistency | Consistent even if $\alpha_i$ correlated | Consistent only if uncorrelated |

| Estimation | Demeaning / LSDV | GLS |

Hausman Test

The Hausman test compares FE and RE to determine which is appropriate.

Hausman Test

H = (\hat{\beta}_{FE} - \hat{\beta}_{RE})'(\text{Var}(\hat{\beta}_{FE}) - \text{Var}(\hat{\beta}_{RE}))^{-1}(\hat{\beta}_{FE} - \hat{\beta}_{RE})

Here,

$H$ =Test statistic (asymptotically $\chi^2_k$)
$\hat{\beta}_{FE}$ =Fixed effects estimates
$\hat{\beta}_{RE}$ =Random effects estimates

| Decision | Interpretation |

|---------|---------------|

| Reject $H_0$ | Use Fixed Effects (correlation exists) |

| Fail to reject $H_0$ | Use Random Effects (more efficient) |

First Differences Alternative

First Difference Estimator

\Delta Y_{it} = \beta \Delta X_{it} + \Delta \varepsilon_{it}

Here,

$\Delta$ =First difference operator: $\Delta Y_{it} = Y_{it} - Y_{i,t-1}$

Differencing eliminates the entity-specific effect, just like demeaning.

Time Fixed Effects

Controls for factors that change over time but are constant across entities (e.g., economic shocks, policy changes).

Two-Way Fixed Effects

Y_{it} = \alpha_i + \lambda_t + \beta X_{it} + \varepsilon_{it}

Here,

$\alpha_i$ =Entity fixed effects
$\lambda_t$ =Time fixed effects

Python Implementation


import numpy as np

import pandas as pd

import statsmodels.api as sm

from linearmodels.panel import PanelOLS, RandomEffects

from linearmodels.panel import compare

import matplotlib.pyplot as plt



np.random.seed(42)



# Simulate panel data

n_entities = 100

n_periods = 10

n = n_entities * n_periods



entity_id = np.repeat(np.arange(n_entities), n_periods)

time_id = np.tile(np.arange(n_periods), n_entities)



# Entity effects

alpha = np.random.randn(n_entities) * 2

alpha_panel = alpha[entity_id]



# Covariates

X = np.random.randn(n)

Y = 5 + alpha_panel + 0.8 * X + np.random.randn(n) * 1.5



df = pd.DataFrame({

    'Y': Y, 'X': X,

    'entity': entity_id,

    'time': time_id

}).set_index(['entity', 'time'])



# Fixed Effects

fe_model = PanelOLS.from_formula('Y ~ 1 + X', data=df, entity_effects=True)

fe_result = fe_model.fit()

print("Fixed Effects:")

print(fe_result.summary.tables[1])



# Random Effects

re_model = RandomEffects.from_formula('Y ~ 1 + X', data=df)

re_result = re_model.fit()

print("\nRandom Effects:")

print(re_result.summary.tables[1])



# Compare

print("\nModel Comparison:")

print(compare({'FE': fe_result, 'RE': re_result}))

Worked Example

Example: Wage Determinants

Panel data: 500 workers over 5 years, examining effects of education and experience on wages.

|-------|---------------|----------------|-------------|

| Pooled OLS | 2.85*** | 0.42*** | 0.28 |

| Fixed Effects | 2.12*** | 0.38*** | 0.15 |

| Random Effects | 2.65*** | 0.40*** | 0.32 |

Hausman test: $\chi^2 = 28.4$ , p < 0.001 -> Use Fixed Effects

The FE estimate of education (2.12) is smaller than pooled OLS (2.85), suggesting positive omitted variable bias (e.g., ability correlated with both education and wages).

Key Takeaways

Summary: Panel Data Analysis

Panel data tracks the same units over time, enabling control for unobserved heterogeneity
Fixed Effects eliminate entity-specific intercepts through demeaning
Random Effects assume entity effects are uncorrelated with regressors
Hausman test determines whether FE or RE is more appropriate
Time fixed effects control for temporal shocks constant across entities
FE cannot estimate time-invariant covariates (e.g., gender, race)
Always check for serial correlation and heteroscedasticity in panel data

Panel Data Analysis — Fixed and Random Effects

Panel Data Analysis — Fixed and Random Effects

Leveraging Cross-Sectional and Time-Series Dimensions

DfPanel Data

Panel Data Structure

The Pooled OLS Problem

Fixed Effects (FE) Model

Fixed Effects Model

Random Effects (RE) Model

Random Effects Model

FE vs RE Comparison

Hausman Test

Hausman Test

First Differences Alternative

First Difference Estimator

Time Fixed Effects

Two-Way Fixed Effects

Python Implementation

Worked Example

Example: Wage Determinants

Key Takeaways

Summary: Panel Data Analysis

Related Topics

Premium Content

Need Expert Statistics Help?