Randomized Controlled Trials — Design and Analysis
Statistics
The Gold Standard for Establishing Causation
RCTs eliminate confounding through random assignment, ensuring treatment groups are comparable in expectation. Proper design — blinding, power analysis, intention-to-treat — maximizes the credibility of causal conclusions.
-
Drug Development — Establish pharmaceutical efficacy for regulatory approval
-
Technology — Test feature impact through A/B testing on user populations
-
Education — Evaluate curriculum changes with randomized classroom assignments
Randomization is the great equalizer — it balances known and unknown confounders simultaneously.
A randomized controlled trial (RCT) is the gold standard for establishing causal relationships because randomization ensures that treatment and control groups are comparable in expectation.
DfRandomized Controlled Trial
An experimental design where units are randomly assigned to treatment or control conditions, allowing causal effects to be estimated without confounding.
Key Components of an RCT
| Component | Description |
|-----------|------------|
| Randomization | Random assignment to treatment/control |
| Control group | Receives placebo or standard treatment |
| Blinding | Participants/researchers unaware of assignment |
| Sample size | Determined by power analysis |
| Pre-registration | Specify analysis plan before data collection |
Why Randomization Works
Balance Through Randomization
Randomization ensures that all confounders (observed and unobserved) are, in expectation, equally distributed across groups:
This eliminates selection bias and allows clean causal identification.
Treatment Effects in RCTs
ATE in RCTs
Here,
- =Mean outcome in treatment group
- =Mean outcome in control group
With randomization, the naive comparison identifies the ATE.
Sample Size and Power
Sample Size for Two Means
Here,
- =Significance level (typically 0.05)
- =Type II error rate (power = 1 - ß)
- =Standard deviation of outcome
- =Minimum detectable effect size
Power Considerations
-
Higher power (e.g., 0.90) requires larger samples
-
Smaller effects require larger samples
-
More variability requires larger samples
-
Always conduct a power analysis before the trial
Types of Analysis
Intention-to-Treat (ITT)
DfITT Analysis
Analyze participants in the group they were originally assigned to, regardless of whether they actually received the treatment.
| Advantage | Disadvantage |
|-----------|-------------|
| Preserves randomization | May underestimate effect |
| Handles non-compliance | Diluted by non-adherence |
| Clinically relevant | |
Per-Protocol Analysis
Analyze only participants who fully complied with the protocol. May introduce bias if non-compliance is related to outcomes.
Blinding
| Type | Who is blinded | Purpose |
|------|---------------|---------|
| Single-blind | Participants | Reduces placebo effect |
| Double-blind | Participants + researchers | Reduces observer bias |
| Triple-blind | Participants + researchers + analysts | Reduces analysis bias |
Common Pitfalls
Threats to Validity
-
Attrition: Participants drop out differentially
-
Contamination: Control group receives treatment
-
Hawthorne effect: Behavior changes because of observation
-
Non-compliance: Participants don't follow assigned treatment
-
Multiple testing: Testing many outcomes inflates false positives
CONSORT Flow
A well-reported RCT follows the CONSORT guidelines:
-
Enrollment: How many were assessed and randomized?
-
Allocation: How many assigned to each group?
-
Follow-up: How many lost to follow-up?
-
Analysis: How many included in final analysis?
Python Implementation
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
np.random.seed(42)
# Simulate RCT
n = 500
X1 = np.random.randn(n) # Age
X2 = np.random.binomial(1, 0.5, n) # Gender
# Randomization
T = np.random.binomial(1, 0.5, n)
# Outcome (true ATE = 3.0)
Y0 = 50 + 0.5*X1 + 2*X2 + np.random.randn(n)*10
Y1 = Y0 + 3.0
Y = T * Y1 + (1 - T) * Y0
df = pd.DataFrame({'Y': Y, 'T': T, 'age': X1, 'gender': X2})
# Check balance (should be balanced due to randomization)
treat = df[df['T']==1]
control = df[df['T']==0]
print("Balance check:")
print(f"Age: treat={treat['age'].mean():.2f}, control={control['age'].mean():.2f}")
print(f"Gender: treat={treat['gender'].mean():.2f}, control={control['gender'].mean():.2f}")
# Two-sample t-test
t_stat, p_val = stats.ttest_ind(treat['Y'], control['Y'])
print(f"\nTreatment effect: {treat['Y'].mean() - control['Y'].mean():.2f}")
print(f"95% CI: [{treat['Y'].mean()-control['Y'].mean()-1.96*10*np.sqrt(2/n):.2f}, "
f"{treat['Y'].mean()-control['Y'].mean()+1.96*10*np.sqrt(2/n):.2f}]")
print(f"p-value: {p_val:.4f}")
# Power analysis
from statsmodels.stats.power import TTestIndPower
power_analysis = TTestIndPower()
power = power_analysis.power(effect_size=3.0/10, nobs1=250, ratio=1.0, alpha=0.05)
print(f"\nPower: {power:.3f}")
Worked Example
Example: Drug Efficacy Trial
A Phase III trial tests a new blood pressure drug:
| Metric | Treatment (n=200) | Control (n=200) | Difference |
|--------|-------------------|-----------------|------------|
| Mean SBP | 128.5 mmHg | 134.2 mmHg | -5.7 mmHg |
| SD | 12.3 | 11.8 | — |
| 95% CI | — | — | [-8.1, -3.3] |
| p-value | — | — | < 0.001 |
ITT analysis: 15 patients in treatment group didn't take medication. The ITT analysis includes them (diluted effect = -5.0).
Per-protocol: Among compliant patients only (effect = -6.8).
Both approaches show significant benefit; per-protocol shows larger effect but may be biased.
Key Takeaways
Summary: RCTs
-
Randomization is the gold standard for causal inference
-
It balances all confounders (observed and unobserved) across groups
-
Conduct a power analysis before the trial to determine sample size
-
Intention-to-treat analysis is preferred for preserving randomization
-
Blinding reduces bias (double-blind is ideal)
-
Follow CONSORT guidelines for transparent reporting
-
Check balance on baseline characteristics to verify randomization worked
Related Topics
-
See Causal Inference for the potential outcomes framework
-
See Propensity Score Matching for when randomization is not possible
-
See Hypothesis Testing for p-values and significance