Regression Discontinuity Design
Statistics
Exploiting Threshold Rules for Causal Estimation
Regression discontinuity exploits cutoff-based treatment assignment. Units just above and just below the threshold are nearly identical, so the jump in outcomes at the cutoff reveals the causal effect.
-
Education β Estimate scholarship effects using GPA eligibility cutoffs
-
Policy Evaluation β Assess income-based benefit thresholds on employment outcomes
-
Healthcare β Evaluate age-based screening programs at eligibility boundaries
At the cutoff, treatment assignment is as good as random β the discontinuity is the causal effect.
Regression discontinuity (RD) exploits a threshold rule that assigns treatment based on whether a running variable crosses a cutoff. Units just above and just below the cutoff are assumed to be comparable.
DfRegression Discontinuity Design
A quasi-experimental method where treatment is assigned based on a running variable relative to a cutoff . The causal effect is estimated as the discontinuity in the outcome at the cutoff.
Sharp RD
Sharp RD
Here,
- =Treatment indicator for unit i
- =Running variable (forcing variable)
- =Cutoff value
- =Indicator function
Treatment is deterministically assigned: everyone above the cutoff is treated, everyone below is not.
Fuzzy RD
Fuzzy RD
Here,
- =Probability of treatment given running variable
Treatment assignment is probabilistic but has a jump at the cutoff. This is analyzed using IV-like local estimation.
Key Assumption
Continuity Assumption
In the absence of treatment, the conditional expectation would be continuous at the cutoff . This means units just above and below the cutoff are comparable in all respects except treatment.
| Violation | Consequence |
|-----------|------------|
| Manipulation of running variable | Bias β people sort around cutoff |
| Discrete running variable | Binning required; may introduce bias |
| Covariate imbalance at cutoff | Suggests manipulation or confounding |
Local Estimation
RD Estimator (Sharp)
Here,
- =Local Average Treatment Effect at the cutoff
In practice, estimate local polynomial regressions on each side of the cutoff.
Bandwidth Selection
The bandwidth determines the window around the cutoff used for estimation.
Bandwidth Trade-off
-
Small : Less bias but higher variance (fewer observations)
-
Large : Lower variance but more bias (includes distant observations)
Optimal bandwidth methods (e.g., Imbens-Kalyanaraman, CCT) balance this trade-off.
Covariate Balance Check
Before interpreting results, check that baseline covariates are continuous at the cutoff:
Covariate Balance
Here,
- =Baseline covariates
If covariates show discontinuities at the cutoff, the identifying assumption may be violated.
McCrary Density Test
Tests for manipulation of the running variable at the cutoff. If people can precisely control their running variable, they may sort around the cutoff.
Manipulation Test
A significant McCrary test suggests the running variable is manipulated, which undermines the RD design. This is a critical validity check.
Python Implementation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from rdrobust import rdrobust
from rdrobust import rddensity
np.random.seed(42)
# Simulate sharp RD data
n = 1000
X = np.random.uniform(-1, 1, n) # Running variable
T = (X >= 0).astype(int) # Treatment
Y = 2.0 * T + 3.0 * X + np.random.randn(n) * 0.5
# Main RD estimate
result = rdrobust(Y, X, c=0)
print("RD Estimate:")
print(result)
# Covariate balance check
Z = np.random.randn(n) # Covariate
print("\nCovariate balance at cutoff:")
rd_z = rdrobust(Z, X, c=0)
print(rd_z)
# McCrary density test
density = rddensity(X, c=0)
print("\nMcCrary density test:")
print(density)
# Plot
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# Outcome
axes[0].scatter(X, Y, alpha=0.3, s=10)
axes[0].axvline(x=0, color='red', linestyle='--')
axes[0].set_title('Outcome vs Running Variable')
axes[0].set_xlabel('Running Variable')
axes[0].set_ylabel('Outcome')
# Density
axes[1].hist(X, bins=50, edgecolor='black')
axes[1].axvline(x=0, color='red', linestyle='--')
axes[1].set_title('Density of Running Variable')
plt.tight_layout()
plt.show()
Worked Example
Example: Scholarship Eligibility
Students scoring =70 on an entrance exam receive a scholarship (). Outcome is GPA at graduation.
| Bandwidth | Estimate | SE | 95% CI |
|-----------|----------|-----|---------|
| 5 points | 0.45 | 0.12 | [0.21, 0.69] |
| 10 points | 0.38 | 0.09 | [0.20, 0.56] |
| 20 points | 0.32 | 0.07 | [0.18, 0.46] |
McCrary test: p = 0.42 -> No manipulation detected
Covariate balance: All p-values > 0.30 -> Baseline characteristics are continuous at the cutoff
Conclusion: The scholarship has a positive effect on GPA (~0.4 grade points) for students near the eligibility threshold.
Key Takeaways
Summary: Regression Discontinuity
-
RD identifies causal effects using a threshold rule for treatment assignment
-
Sharp RD: Treatment is deterministic based on the cutoff
-
Fuzzy RD: Treatment probability jumps at the cutoff (like IV)
-
The key assumption is continuity of potential outcomes at the cutoff
-
Use covariate balance checks and McCrary test to validate the design
-
Bandwidth selection balances bias-variance trade-off
-
RD provides local treatment effects at the cutoff only
Related Topics
-
See Instrumental Variables for another quasi-experimental method
-
See Difference-in-Differences for policy evaluation
-
See Causal Inference for the potential outcomes framework