🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Causal Inference — Potential Outcomes Framework

StatisticsCausal Inference🟢 Free Lesson

Advertisement

Causal Inference — Potential Outcomes Framework

Statistics

The Gold Standard for Answering Causal Questions

The potential outcomes framework (Rubin Causal Model) defines causal effects as comparisons between what happened and what would have happened. It provides the conceptual foundation for all modern causal inference methods.

  • Medical Research — Define treatment effects rigorously in clinical trial analysis

  • Policy Evaluation — Measure program impacts by comparing actual and counterfactual outcomes

  • Social Science — Establish clear criteria for when causal claims are justified

The fundamental problem — we never observe both potential outcomes — drives all of causal inference.


Causal inference asks: What would have happened if a different treatment had been applied? The potential outcomes framework (Rubin Causal Model) provides a rigorous framework for answering this question.

DfCausal Effect

A treatment TT has a causal effect on outcome YY for unit ii if Yi(1)Yi(0)Y_i(1) \neq Y_i(0), where Yi(1)Y_i(1) and Yi(0)Y_i(0) are the potential outcomes under treatment and control.


Potential Outcomes

Potential Outcomes

Yi=Yi(Ti)=TiYi(1)+(1Ti)Yi(0)Y_i = Y_i(T_i) = T_i Y_i(1) + (1 - T_i)Y_i(0)

Here,

  • Yi(1)Y_i(1)=Potential outcome if unit i receives treatment
  • Yi(0)Y_i(0)=Potential outcome if unit i receives control
  • TiT_i=Treatment indicator (1=treated, 0=control)

Fundamental Problem of Causal Inference

Fundamental Problem

For any individual unit, we can observe only one potential outcome. If Ti=1T_i = 1, we observe Yi(1)Y_i(1) but not Yi(0)Y_i(0). We can never observe both simultaneously.

Therefore, the individual causal effect τi=Yi(1)Yi(0)\tau_i = Y_i(1) - Y_i(0) is fundamentally unobservable.

We can only estimate population-level effects (average treatment effects).


Average Treatment Effects

ATE (Average Treatment Effect)

ATE=E[Yi(1)Yi(0)]=E[Yi(1)]E[Yi(0)]\text{ATE} = E[Y_i(1) - Y_i(0)] = E[Y_i(1)] - E[Y_i(0)]

Here,

  • ATEATE=Average effect across the entire population

ATT (Average Treatment on the Treated)

ATT=E[Yi(1)Yi(0)Ti=1]\text{ATT} = E[Y_i(1) - Y_i(0) \mid T_i = 1]

Here,

  • ATTATT=Average effect among those who actually received treatment

ATU (Average Treatment on the Untreated)

ATU=E[Yi(1)Yi(0)Ti=0]\text{ATU} = E[Y_i(1) - Y_i(0) \mid T_i = 0]

Here,

  • ATUATU=Average effect among those who did not receive treatment

SUTVA

The Stable Unit Treatment Value Assumption has two parts:

| Component | Meaning |

|-----------|---------|

| No interference | One unit's treatment does not affect another unit's outcome |

| Treatment variation irrelevance | There is only one version of each treatment level |

When SUTVA Fails

SUTVA is violated in:

  • Interference: Vaccination (herd effects), education (peer effects)

  • Multiple treatment versions: Different drug dosages considered the same treatment

Special methods (e.g., partial interference models) are needed when SUTVA fails.


Selection Bias

The naive comparison of treated and control groups may be biased because treatment assignment is often not random.

Selection Bias Decomposition

E[YT=1]E[YT=0]=ATE+Selection BiasE[Y|T=1] - E[Y|T=0] = \text{ATE} + \text{Selection Bias}

Here,

  • SelectionBiasSelection Bias=$E[Y(0)|T=1] - E[Y(0)|T=0]$: baseline differences between groups

Randomization Eliminates Selection Bias

With random assignment, treatment and control groups are comparable in expectation, so E[Y(0)T=1]=E[Y(0)T=0]E[Y(0)|T=1] = E[Y(0)|T=0] and the naive comparison identifies the ATE.


Causal Identifying Assumptions

| Assumption | Description | Violation Consequence |

|-----------|------------|----------------------|

| Unconfoundedness | Y(0),Y(1)TXY(0), Y(1) \perp T \mid X | Selection bias |

| Overlap | 0<P(T=1X)<10 < P(T=1|X) < 1 for all X | Cannot estimate effects for some subgroups |

| SUTVA | No interference; one treatment version | Spillover effects bias estimates |


Estimation Under Unconfoundedness

When treatment is unconfounded given covariates XX, we can use matching, weighting, or regression adjustment.

IPW Estimator

ATE^=1ni=1n[TiYie^(Xi)(1Ti)Yi1e^(Xi)]\hat{\text{ATE}} = \frac{1}{n}\sum_{i=1}^{n}\left[\frac{T_i Y_i}{\hat{e}(X_i)} - \frac{(1-T_i)Y_i}{1-\hat{e}(X_i)}\right]

Here,

  • e^(Xi)\hat{e}(X_i)=Estimated propensity score: $P(T=1|X_i)$
  • TiT_i=Treatment indicator

Python Implementation


import numpy as np

import pandas as pd

from sklearn.linear_model import LogisticRegression

import matplotlib.pyplot as plt



np.random.seed(42)



# Simulate data with confounding

n = 1000

X = np.random.randn(n, 3)

propensity = 1 / (1 + np.exp(-(0.5*X[:,0] + 0.3*X[:,1] - 0.2*X[:,2])))

T = np.random.binomial(1, propensity)

Y0 = 2*X[:,0] + X[:,1] + np.random.randn(n)

Y1 = Y0 + 1.5  # True ATE = 1.5

Y = T * Y1 + (1 - T) * Y0



df = pd.DataFrame({'Y': Y, 'T': T, 'X1': X[:,0], 'X2': X[:,1], 'X3': X[:,2]})



# Naive comparison (biased)

naive_diff = df[df['T']==1]['Y'].mean() - df[df['T']==0]['Y'].mean()

print(f"Naive difference: {naive_diff:.3f}")



# IPW estimator

prop_model = LogisticRegression().fit(df[['X1','X2','X3']], df['T'])

e_hat = prop_model.predict_proba(df[['X1','X2','X3']])[:, 1]



ipw = np.mean(T * Y / e_hat - (1 - T) * Y / (1 - e_hat))

print(f"IPW estimate: {ipw:.3f}")

print(f"True ATE: 1.500")

Worked Example

Example: Job Training Program

Evaluating the effect of job training on earnings:

  • Treatment: Enrolled in job training program

  • Outcome: Annual earnings

  • Confounders: Education, age, prior income

| Method | Estimated Effect |

|--------|-----------------|

| Naive comparison | $3,200 |

| IPW | $1,850 |

| Matching (ATE) | $1,920 |

| Regression adjustment | $1,780 |

The naive comparison is biased upward because people who choose training tend to have higher motivation (unobserved confounder). After adjusting, the effect is approximately $1,850.


Key Takeaways

Summary: Potential Outcomes Framework

  • Causal effect: Yi(1)Yi(0)Y_i(1) - Y_i(0) — but only one is ever observed (fundamental problem)

  • ATE = average effect across population; ATT = effect among treated

  • SUTVA assumes no interference between units

  • Randomization eliminates selection bias by making groups comparable

  • Unconfoundedness (Y(0),Y(1)TXY(0),Y(1) \perp T | X) is needed for observational studies

  • Methods: IPW, matching, regression adjustment

  • Always report the assumptions underlying your causal estimates


Related Topics

Premium Content

Causal Inference — Potential Outcomes Framework

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement