Propensity Score Matching

Statistics

Creating Pseudo-Experiments From Observational Data

Propensity score matching pairs treated and control units with similar treatment probabilities, mimicking randomization. It balances observed covariates, reducing selection bias in observational studies.

Healthcare — Compare outcomes for patients who chose different treatments
Education — Evaluate school choice effects by matching applicants with similar backgrounds
Marketing — Assess campaign effectiveness when exposure was not randomly assigned

Matching on the propensity score reduces many dimensions of confounding to a single number.

Propensity score matching (PSM) creates pseudo-experimental conditions in observational studies by matching treated and control units with similar probabilities of receiving treatment.

DfPropensity Score

The probability that a unit receives treatment, given its observed covariates:

Propensity Score

e(X_i) = P(T_i = 1 | X_i)

Here,

$X_i$ =Observed covariates for unit i
$T_i$ =Treatment indicator
$e(X_i)$ =Propensity score

Key Theorem

Rosenbaum-Rubin Theorem (1983)

If treatment assignment is unconfounded given $X$ , it is also unconfounded given the propensity score $e(X)$ :

T \perp Y(0), Y(1) | X \implies T \perp Y(0), Y(1) | e(X)

Matching on the scalar propensity score balances all multivariate covariates.

Assumptions

| Assumption | Meaning | Testable? |

|-----------|---------|-----------|

| Unconfoundedness | No unobserved confounders | No |

| Overlap | $0 < e(X) < 1$ for all X | Yes |

| SUTVA | No interference between units | Partially |

Estimation Steps

| Step | Action |

|------|--------|

| 1 | Estimate propensity score (logistic regression, ML) |

| 2 | Check overlap (common support) |

| 3 | Match treated to control units |

| 4 | Assess covariate balance |

| 5 | Estimate treatment effect |

| 6 | Conduct sensitivity analysis

Matching Methods

| Method | Description |

|--------|------------|

| Nearest neighbor | Match each treated to closest control on propensity score |

| Caliper | Only match if propensity scores are within caliper distance |

| Full matching | Create matched sets that partition all units |

| Kernel matching | Weight all controls by kernel function of propensity score |

Caliper Matching

A common caliper is 0.2 × SD(propensity score). Caliper matching reduces matching bias but may leave some treated units unmatched.

Covariate Balance

After matching, check that covariates are balanced between groups.

Standardized Mean Difference

\text{SMD} = \frac{\bar{X}_T - \bar{X}_C}{\sqrt{(s_T^2 + s_C^2)/2}}

Here,

$\bar{X}_T$ =Mean in treated group
$\bar{X}_C$ =Mean in control group
$s_T, s_C$ =Standard deviations

| SMD | Interpretation |

|-----|---------------|

| < 0.1 | Excellent balance |

| 0.1 - 0.2 | Adequate balance |

| > 0.2 | Poor balance — matching failed |

ATT Estimation

ATT via IPW

\hat{\tau}_{ATT} = \frac{\sum_{i:T_i=1} Y_i}{n_T} - \frac{\sum_{j:T_j=0} w_j Y_j}{\sum_j w_j}

Here,

$w_j$ =Matching weight for control unit j
$n_T$ =Number of treated units

Sensitivity Analysis

Unconfoundedness is Untestable

PSM assumes no unmeasured confounders. Sensitivity analysis (e.g., Rosenbaum bounds) assesses how strong an unmeasured confounder would need to be to change the conclusion.

Rosenbaum's Gamma

\Gamma = \frac{P(T_i=1|X, U) / P(T_i=0|X, U)}{P(T_j=1|X, U) / P(T_j=0|X, U)}

Here,

$\Gamma$ =Bound on the degree of hidden bias
$\Gamma = 1$ =No hidden bias
$Large \Gamma$ =Robust to hidden bias

Python Implementation


import numpy as np

import pandas as pd

from sklearn.linear_model import LogisticRegression

from sklearn.neighbors import NearestNeighbors

import matplotlib.pyplot as plt



np.random.seed(42)



# Simulate observational data

n = 1000

X1 = np.random.randn(n)

X2 = np.random.binomial(1, 0.5, n)



# Propensity (confounded)

propensity = 1 / (1 + np.exp(-(0.5*X1 + 0.3*X2)))

T = np.random.binomial(1, propensity)



# Outcome (true ATE = 2.0)

Y0 = 3*X1 + 2*X2 + np.random.randn(n)

Y1 = Y0 + 2.0

Y = T * Y1 + (1 - T) * Y0



df = pd.DataFrame({'Y': Y, 'T': T, 'X1': X1, 'X2': X2})



# Estimate propensity score

logit = LogisticRegression().fit(df[['X1','X2']], df['T'])

df['ps'] = logit.predict_proba(df[['X1','X2']])[:, 1]



# Match

treated_idx = df[df['T']==1].index

control_idx = df[df['T']==0].index



nn = NearestNeighbors(n_neighbors=1, metric='euclidean')

nn.fit(df.loc[control_idx, ['ps']])

distances, matches = nn.kneighbors(df.loc[treated_idx, ['ps']])



# Balance check

for col in ['X1', 'X2']:

    before_smd = abs(df[df['T']==1][col].mean() - df[df['T']==0][col].mean()) / \

                 np.sqrt((df[df['T']==1][col].var() + df[df['T']==0][col].var())/2)

    

    matched_control = control_idx[matches.flatten()]

    after_smd = abs(df[df['T']==1][col].mean() - df.loc[matched_control, col].mean()) / \

                np.sqrt((df[df['T']==1][col].var() + df.loc[matched_control, col].var())/2)

    

    print(f"{col}: Before SMD={before_smd:.3f}, After SMD={after_smd:.3f}")



# ATT estimate

att = df[df['T']==1]['Y'].mean() - df.loc[matched_control, 'Y'].mean()

print(f"\nATT estimate: {att:.3f} (true: 2.0)")

Worked Example

Example: Effect of Smoking on Birth Weight

Observational study comparing birth weights of smokers vs non-smokers:

Before matching:

| Covariate | SMD |

|-----------|-----|

| Age | 0.35 |

| Income | 0.52 |

| Education | 0.28 |

After matching:

| Covariate | SMD |

|-----------|-----|

| Age | 0.04 |

| Income | 0.08 |

| Education | 0.05 |

ATT estimate: -245 grams (95% CI: [-310, -180])

Smoking reduces birth weight by approximately 245 grams. Rosenbaum's G = 2.5 — the result holds unless an unmeasured confounder more than doubles the odds of treatment.

Key Takeaways

Summary: Propensity Score Matching

PSM creates pseudo-experimental conditions from observational data
The propensity score $e(X) = P(T=1|X)$ summarizes all confounders
Match on the propensity score, not raw covariates
Check covariate balance (SMD < 0.1) after matching
Overlap assumption: propensity scores must overlap between groups
Unconfoundedness is untestable — use sensitivity analysis (Rosenbaum bounds)
Common caliper: 0.2 × SD(propensity score)

Propensity Score Matching

Propensity Score Matching

Creating Pseudo-Experiments From Observational Data

DfPropensity Score

Propensity Score

Key Theorem

Assumptions

Estimation Steps

Matching Methods

Covariate Balance

Standardized Mean Difference

ATT Estimation

ATT via IPW

Sensitivity Analysis

Rosenbaum's Gamma

Python Implementation

Worked Example

Example: Effect of Smoking on Birth Weight

Key Takeaways

Summary: Propensity Score Matching

Related Topics

Premium Content

Need Expert Statistics Help?