Causal Inference — Potential Outcomes Framework
Statistics
The Gold Standard for Answering Causal Questions
The potential outcomes framework (Rubin Causal Model) defines causal effects as comparisons between what happened and what would have happened. It provides the conceptual foundation for all modern causal inference methods.
-
Medical Research — Define treatment effects rigorously in clinical trial analysis
-
Policy Evaluation — Measure program impacts by comparing actual and counterfactual outcomes
-
Social Science — Establish clear criteria for when causal claims are justified
The fundamental problem — we never observe both potential outcomes — drives all of causal inference.
Causal inference asks: What would have happened if a different treatment had been applied? The potential outcomes framework (Rubin Causal Model) provides a rigorous framework for answering this question.
DfCausal Effect
A treatment has a causal effect on outcome for unit if , where and are the potential outcomes under treatment and control.
Potential Outcomes
Potential Outcomes
Here,
- =Potential outcome if unit i receives treatment
- =Potential outcome if unit i receives control
- =Treatment indicator (1=treated, 0=control)
Fundamental Problem of Causal Inference
Fundamental Problem
For any individual unit, we can observe only one potential outcome. If , we observe but not . We can never observe both simultaneously.
Therefore, the individual causal effect is fundamentally unobservable.
We can only estimate population-level effects (average treatment effects).
Average Treatment Effects
ATE (Average Treatment Effect)
Here,
- =Average effect across the entire population
ATT (Average Treatment on the Treated)
Here,
- =Average effect among those who actually received treatment
ATU (Average Treatment on the Untreated)
Here,
- =Average effect among those who did not receive treatment
SUTVA
The Stable Unit Treatment Value Assumption has two parts:
| Component | Meaning |
|-----------|---------|
| No interference | One unit's treatment does not affect another unit's outcome |
| Treatment variation irrelevance | There is only one version of each treatment level |
When SUTVA Fails
SUTVA is violated in:
-
Interference: Vaccination (herd effects), education (peer effects)
-
Multiple treatment versions: Different drug dosages considered the same treatment
Special methods (e.g., partial interference models) are needed when SUTVA fails.
Selection Bias
The naive comparison of treated and control groups may be biased because treatment assignment is often not random.
Selection Bias Decomposition
Here,
- =$E[Y(0)|T=1] - E[Y(0)|T=0]$: baseline differences between groups
Randomization Eliminates Selection Bias
With random assignment, treatment and control groups are comparable in expectation, so and the naive comparison identifies the ATE.
Causal Identifying Assumptions
| Assumption | Description | Violation Consequence |
|-----------|------------|----------------------|
| Unconfoundedness | | Selection bias |
| Overlap | for all X | Cannot estimate effects for some subgroups |
| SUTVA | No interference; one treatment version | Spillover effects bias estimates |
Estimation Under Unconfoundedness
When treatment is unconfounded given covariates , we can use matching, weighting, or regression adjustment.
IPW Estimator
Here,
- =Estimated propensity score: $P(T=1|X_i)$
- =Treatment indicator
Python Implementation
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as plt
np.random.seed(42)
# Simulate data with confounding
n = 1000
X = np.random.randn(n, 3)
propensity = 1 / (1 + np.exp(-(0.5*X[:,0] + 0.3*X[:,1] - 0.2*X[:,2])))
T = np.random.binomial(1, propensity)
Y0 = 2*X[:,0] + X[:,1] + np.random.randn(n)
Y1 = Y0 + 1.5 # True ATE = 1.5
Y = T * Y1 + (1 - T) * Y0
df = pd.DataFrame({'Y': Y, 'T': T, 'X1': X[:,0], 'X2': X[:,1], 'X3': X[:,2]})
# Naive comparison (biased)
naive_diff = df[df['T']==1]['Y'].mean() - df[df['T']==0]['Y'].mean()
print(f"Naive difference: {naive_diff:.3f}")
# IPW estimator
prop_model = LogisticRegression().fit(df[['X1','X2','X3']], df['T'])
e_hat = prop_model.predict_proba(df[['X1','X2','X3']])[:, 1]
ipw = np.mean(T * Y / e_hat - (1 - T) * Y / (1 - e_hat))
print(f"IPW estimate: {ipw:.3f}")
print(f"True ATE: 1.500")
Worked Example
Example: Job Training Program
Evaluating the effect of job training on earnings:
-
Treatment: Enrolled in job training program
-
Outcome: Annual earnings
-
Confounders: Education, age, prior income
| Method | Estimated Effect |
|--------|-----------------|
| Naive comparison | $3,200 |
| IPW | $1,850 |
| Matching (ATE) | $1,920 |
| Regression adjustment | $1,780 |
The naive comparison is biased upward because people who choose training tend to have higher motivation (unobserved confounder). After adjusting, the effect is approximately $1,850.
Key Takeaways
Summary: Potential Outcomes Framework
-
Causal effect: — but only one is ever observed (fundamental problem)
-
ATE = average effect across population; ATT = effect among treated
-
SUTVA assumes no interference between units
-
Randomization eliminates selection bias by making groups comparable
-
Unconfoundedness () is needed for observational studies
-
Methods: IPW, matching, regression adjustment
-
Always report the assumptions underlying your causal estimates
Related Topics
-
See Randomized Controlled Trials for the gold standard of causal inference
-
See Propensity Score Matching for observational study methods
-
See Instrumental Variables for when unconfoundedness fails