Propensity Score Matching
Statistics
Creating Pseudo-Experiments From Observational Data
Propensity score matching pairs treated and control units with similar treatment probabilities, mimicking randomization. It balances observed covariates, reducing selection bias in observational studies.
-
Healthcare β Compare outcomes for patients who chose different treatments
-
Education β Evaluate school choice effects by matching applicants with similar backgrounds
-
Marketing β Assess campaign effectiveness when exposure was not randomly assigned
Matching on the propensity score reduces many dimensions of confounding to a single number.
Propensity score matching (PSM) creates pseudo-experimental conditions in observational studies by matching treated and control units with similar probabilities of receiving treatment.
DfPropensity Score
The probability that a unit receives treatment, given its observed covariates:
Propensity Score
Here,
- =Observed covariates for unit i
- =Treatment indicator
- =Propensity score
Key Theorem
Rosenbaum-Rubin Theorem (1983)
If treatment assignment is unconfounded given , it is also unconfounded given the propensity score :
Matching on the scalar propensity score balances all multivariate covariates.
Assumptions
| Assumption | Meaning | Testable? |
|-----------|---------|-----------|
| Unconfoundedness | No unobserved confounders | No |
| Overlap | for all X | Yes |
| SUTVA | No interference between units | Partially |
Estimation Steps
| Step | Action |
|------|--------|
| 1 | Estimate propensity score (logistic regression, ML) |
| 2 | Check overlap (common support) |
| 3 | Match treated to control units |
| 4 | Assess covariate balance |
| 5 | Estimate treatment effect |
| 6 | Conduct sensitivity analysis
Matching Methods
| Method | Description |
|--------|------------|
| Nearest neighbor | Match each treated to closest control on propensity score |
| Caliper | Only match if propensity scores are within caliper distance |
| Full matching | Create matched sets that partition all units |
| Kernel matching | Weight all controls by kernel function of propensity score |
Caliper Matching
A common caliper is 0.2 Γ SD(propensity score). Caliper matching reduces matching bias but may leave some treated units unmatched.
Covariate Balance
After matching, check that covariates are balanced between groups.
Standardized Mean Difference
Here,
- =Mean in treated group
- =Mean in control group
- =Standard deviations
| SMD | Interpretation |
|-----|---------------|
| < 0.1 | Excellent balance |
| 0.1 - 0.2 | Adequate balance |
| > 0.2 | Poor balance β matching failed |
ATT Estimation
ATT via IPW
Here,
- =Matching weight for control unit j
- =Number of treated units
Sensitivity Analysis
Unconfoundedness is Untestable
PSM assumes no unmeasured confounders. Sensitivity analysis (e.g., Rosenbaum bounds) assesses how strong an unmeasured confounder would need to be to change the conclusion.
Rosenbaum's Gamma
Here,
- =Bound on the degree of hidden bias
- =No hidden bias
- =Robust to hidden bias
Python Implementation
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors
import matplotlib.pyplot as plt
np.random.seed(42)
# Simulate observational data
n = 1000
X1 = np.random.randn(n)
X2 = np.random.binomial(1, 0.5, n)
# Propensity (confounded)
propensity = 1 / (1 + np.exp(-(0.5*X1 + 0.3*X2)))
T = np.random.binomial(1, propensity)
# Outcome (true ATE = 2.0)
Y0 = 3*X1 + 2*X2 + np.random.randn(n)
Y1 = Y0 + 2.0
Y = T * Y1 + (1 - T) * Y0
df = pd.DataFrame({'Y': Y, 'T': T, 'X1': X1, 'X2': X2})
# Estimate propensity score
logit = LogisticRegression().fit(df[['X1','X2']], df['T'])
df['ps'] = logit.predict_proba(df[['X1','X2']])[:, 1]
# Match
treated_idx = df[df['T']==1].index
control_idx = df[df['T']==0].index
nn = NearestNeighbors(n_neighbors=1, metric='euclidean')
nn.fit(df.loc[control_idx, ['ps']])
distances, matches = nn.kneighbors(df.loc[treated_idx, ['ps']])
# Balance check
for col in ['X1', 'X2']:
before_smd = abs(df[df['T']==1][col].mean() - df[df['T']==0][col].mean()) / \
np.sqrt((df[df['T']==1][col].var() + df[df['T']==0][col].var())/2)
matched_control = control_idx[matches.flatten()]
after_smd = abs(df[df['T']==1][col].mean() - df.loc[matched_control, col].mean()) / \
np.sqrt((df[df['T']==1][col].var() + df.loc[matched_control, col].var())/2)
print(f"{col}: Before SMD={before_smd:.3f}, After SMD={after_smd:.3f}")
# ATT estimate
att = df[df['T']==1]['Y'].mean() - df.loc[matched_control, 'Y'].mean()
print(f"\nATT estimate: {att:.3f} (true: 2.0)")
Worked Example
Example: Effect of Smoking on Birth Weight
Observational study comparing birth weights of smokers vs non-smokers:
Before matching:
| Covariate | SMD |
|-----------|-----|
| Age | 0.35 |
| Income | 0.52 |
| Education | 0.28 |
After matching:
| Covariate | SMD |
|-----------|-----|
| Age | 0.04 |
| Income | 0.08 |
| Education | 0.05 |
ATT estimate: -245 grams (95% CI: [-310, -180])
Smoking reduces birth weight by approximately 245 grams. Rosenbaum's G = 2.5 β the result holds unless an unmeasured confounder more than doubles the odds of treatment.
Key Takeaways
Summary: Propensity Score Matching
-
PSM creates pseudo-experimental conditions from observational data
-
The propensity score summarizes all confounders
-
Match on the propensity score, not raw covariates
-
Check covariate balance (SMD < 0.1) after matching
-
Overlap assumption: propensity scores must overlap between groups
-
Unconfoundedness is untestable β use sensitivity analysis (Rosenbaum bounds)
-
Common caliper: 0.2 Γ SD(propensity score)
Related Topics
-
See Causal Inference for the potential outcomes framework
-
See Randomized Controlled Trials for the gold standard
-
See Missing Data for related topics on data quality