Meta-Analysis
Advanced Statistical Methods
Combining Evidence Across Studies for Stronger Conclusions
Meta-analysis statistically synthesizes results from multiple studies to produce a single summary estimate with greater precision. Fixed-effect and random-effects models account for heterogeneity across studies.
- Clinical medicine — Combine trial results to establish definitive treatment guidelines
- Education — Synthesize intervention studies to identify effective teaching strategies
- Environmental policy — Aggregate epidemiological evidence for regulatory decision-making
Meta-analysis transforms a forest of individual studies into a clear, quantitative conclusion.
DfMeta-Analysis
A meta-analysis is a statistical procedure that combines results from multiple independent studies to produce a single summary estimate of an effect size. It quantifies the overall evidence, assesses consistency across studies, and identifies sources of heterogeneity.
"The goal of meta-analysis is not to produce a single number, but to understand the structure of evidence across studies." — Higgins & Green, Cochrane Handbook
Why Meta-Analysis?
Individual studies may be:
- Underpowered: Too small to detect the true effect
- Conflicting: Some studies find significance, others do not
- Context-specific: Results vary by population, intervention, or setting
Meta-analysis addresses these issues by:
- Increasing statistical power through pooled sample sizes
- Quantifying heterogeneity across studies
- Identifying moderators that explain variability
- Providing a transparent, replicable summary of evidence
Effect Size Measures
Before pooling, each study's result must be converted to a common metric.
Standardized Mean Difference (Cohen's d)
where
Hedges' g (Bias-Corrected)
For binary outcomes:
Odds Ratio
where are the cell frequencies in a 2×2 table:
| Event | No Event | |
|---|---|---|
| Treatment | a | b |
| Control | c | d |
Fixed-Effect Model
DfFixed-Effect Model
The fixed-effect model assumes all studies share a common true effect size . Variation across studies is due solely to sampling error. Study 's observed effect is:
The pooled estimate is the inverse-variance weighted mean:
Fixed-Effect Pooled Estimate
The variance of the pooled estimate:
Cochran's Q
DfCochran's Q
Cochran's Q tests whether the observed effects are consistent with a common true effect:
Under the fixed-effect null hypothesis, .
Interpretation of Q
A significant Q () suggests heterogeneity beyond sampling error, indicating that a random-effects model may be more appropriate.
Random-Effects Model
DfRandom-Effects Model
The random-effects model assumes true effect sizes vary across studies:
where is the study-specific deviation from the overall mean , and is the sampling error.
The total variance of study is:
Random-Effects Variance
The pooled estimate:
Random-Effects Pooled Estimate
Key Difference
In random-effects models, studies with larger variance (smaller samples) receive relatively more weight compared to fixed-effect models, since the added term dilutes the precision advantage of large studies.
Estimating τ²
DerSimonian-Laird Method
DfDerSimonian-Laird Estimator
The most common method for estimating :
If , then .
Other Estimators
| Method | Description | Property |
|---|---|---|
| REML | Restricted maximum likelihood | Less biased, often preferred |
| Paule-Mandel | Iterative matching of expected Q | Good small-sample properties |
| Hedges | Moment-based | Simple closed-form |
| PM | Profile likelihood | Better coverage in simulation |
Heterogeneity Measures
I² Statistic
DfI² Statistic
represents the percentage of variability due to heterogeneity rather than sampling error. Values:
- : No observed heterogeneity
- : Low heterogeneity
- : Moderate heterogeneity
- : High heterogeneity
τ² and τ
- is the between-study variance (absolute heterogeneity)
- is the standard deviation of true effects
Prediction Interval
Prediction Interval
A 95% prediction interval for a new study's effect:
This is wider than the confidence interval and quantifies the range of effects we might expect in a future study.
Publication Bias
DfPublication Bias
Publication bias occurs when the likelihood of a study being published depends on its results. Studies with statistically significant or positive findings are more likely to be published, inflating the meta-analytic estimate.
Funnel Plot
DfFunnel Plot
A scatter plot of effect sizes (x-axis) against a measure of precision (y-axis, typically standard error). In the absence of bias, studies should form a symmetric inverted funnel shape centered on the pooled estimate.
Egger's Test
DfEgger's Test
Egger's test regresses the standardized effect sizes on precision:
A significant intercept () at suggests small-study effects (asymmetry).
Trim-and-Fill
DfTrim-and-Fill
The trim-and-fill method (Duval & Tweedie, 2000) imputes missing studies to restore funnel plot symmetry:
- Estimate the number of missing studies () by trimming extreme values
- Impute studies on the sparse side of the funnel
- Recompute the pooled estimate including imputed studies
Limitations
Trim-and-fill assumes asymmetry is solely due to publication bias. Asymmetry may also result from true heterogeneity, small-study effects, or methodological differences. Use multiple methods.
Moderator Analysis (Meta-Regression)
DfMeta-Regression
Meta-regression extends meta-analysis by modeling the relationship between study-level covariates and effect sizes:
where are study-level characteristics (e.g., dose, duration, population age).
The proportion of heterogeneity explained:
Network Meta-Analysis
DfNetwork Meta-Analysis
Network meta-analysis (NMA), also called mixed treatment comparisons, allows indirect comparisons of multiple treatments using a network of randomized trials. If Treatment A vs B and Treatment B vs C have been studied, NMA can estimate A vs C without direct head-to-head evidence.
Consistency Assumption
DfConsistency
The consistency assumption states that direct and indirect evidence agree:
Inconsistency is assessed using node-splitting models or design-by-treatment interaction models.
SUCRA (Surface Under the Cumulative Ranking)
SUCRA
SUCRA ranges from 0% (worst) to 100% (best), summarizing the probability that a treatment is ranked among the best.
Python Implementation
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# --- Fixed-effect meta-analysis (inverse-variance method) ---
def fixed_effect_meta-analysis(effects, variances):
"""
Fixed-effect meta-analysis using inverse-variance weighting.
Parameters:
effects: array of effect sizes (one per study)
variances: array of sampling variances
Returns:
dict with pooled estimate, CI, Q statistic, I²
"""
effects = np.asarray(effects)
variances = np.asarray(variances)
weights = 1.0 / variances
K = len(effects)
theta_hat = np.sum(weights * effects) / np.sum(weights)
var_theta = 1.0 / np.sum(weights)
se_theta = np.sqrt(var_theta)
# Cochran's Q
Q = np.sum(weights * (effects - theta_hat)**2)
df = K - 1
p_Q = 1 - stats.chi2.cdf(Q, df)
# I²
I2 = max(0, (Q - df) / Q * 100) if Q > 0 else 0
# 95% CI
ci_lower = theta_hat - 1.96 * se_theta
ci_upper = theta_hat + 1.96 * se_theta
return {
'theta': theta_hat, 'se': se_theta,
'ci_95': (ci_lower, ci_upper),
'Q': Q, 'df': df, 'p_Q': p_Q, 'I2': I2,
'weights': weights / np.sum(weights) * 100
}
# --- Random-effects meta-analysis (DerSimonian-Laird) ---
def random_effects_meta_analysis(effects, variances):
"""
Random-effects meta-analysis using DerSimonian-Laird.
"""
effects = np.asarray(effects)
variances = np.asarray(variances)
K = len(effects)
# Fixed-effect for Q calculation
w_fe = 1.0 / variances
theta_fe = np.sum(w_fe * effects) / np.sum(w_fe)
Q = np.sum(w_fe * (effects - theta_fe)**2)
# DerSimonian-Laird tau²
C = np.sum(w_fe) - np.sum(w_fe**2) / np.sum(w_fe)
tau2 = max(0, (Q - (K - 1)) / C)
# Random-effects weights
w_re = 1.0 / (variances + tau2)
theta_re = np.sum(w_re * effects) / np.sum(w_re)
var_re = 1.0 / np.sum(w_re)
se_re = np.sqrt(var_re)
tau = np.sqrt(tau2)
# Prediction interval
t_crit = stats.t.ppf(0.975, K - 2)
pred_lower = theta_re - t_crit * np.sqrt(tau2 + var_re)
pred_upper = theta_re + t_crit * np.sqrt(tau2 + var_re)
return {
'theta': theta_re, 'se': se_re,
'ci_95': (theta_re - 1.96*se_re, theta_re + 1.96*se_re),
'tau2': tau2, 'tau': tau,
'Q': Q, 'df': K-1, 'I2': max(0, (Q-(K-1))/Q*100) if Q > 0 else 0,
'pred_interval': (pred_lower, pred_upper),
'weights': w_re / np.sum(w_re) * 100
}
# --- Funnel plot ---
def funnel_plot(effects, se, labels=None):
"""Create a funnel plot for publication bias assessment."""
fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(effects, se, s=50, c='steelblue', edgecolors='black', alpha=0.7)
theta_pooled = np.average(effects, weights=1/np.array(se)**2)
ax.axvline(x=theta_pooled, color='red', linestyle='--', label='Pooled estimate')
# Pseudo 95% CI funnel
se_range = np.linspace(0.01, max(se)*1.1, 100)
for z in [1.96, -1.96]:
ax.plot(theta_pooled + z * se_range, se_range, 'gray', linestyle=':', alpha=0.5)
ax.set_xlabel('Effect Size')
ax.set_ylabel('Standard Error')
ax.set_title('Funnel Plot')
ax.invert_yaxis()
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('funnel_plot.png', dpi=150)
plt.show()
# --- Example: 10 studies on drug efficacy ---
np.random.seed(42)
K = 10
true_effect = 0.40 # True standardized mean difference
true_tau = 0.15
# Generate study effects
true_effects = np.random.normal(true_effect, true_tau, K)
sample_sizes = np.random.randint(30, 200, K)
variances = 2 / sample_sizes + true_tau**2 * np.random.uniform(0.5, 1.5, K)
observed_effects = np.random.normal(true_effects, np.sqrt(variances))
print("=== Fixed-Effect Meta-Analysis ===")
fe = fixed_effect_meta_analysis(observed_effects, variances)
print(f"Pooled effect: {fe['theta']:.3f} (SE: {fe['se']:.3f})")
print(f"95% CI: ({fe['ci_95'][0]:.3f}, {fe['ci_95'][1]:.3f})")
print(f"Cochran's Q: {fe['Q']:.2f}, df={fe['df']}, p={fe['p_Q']:.4f}")
print(f"I²: {fe['I2']:.1f}%")
print("\n=== Random-Effects Meta-Analysis (DerSimonian-Laird) ===")
re = random_effects_meta_analysis(observed_effects, variances)
print(f"Pooled effect: {re['theta']:.3f} (SE: {re['se']:.3f})")
print(f"95% CI: ({re['ci_95'][0]:.3f}, {re['ci_95'][1]:.3f})")
print(f"τ²: {re['tau2']:.4f}, τ: {re['tau']:.3f}")
print(f"I²: {re['I2']:.1f}%")
print(f"Prediction interval: ({re['pred_interval'][0]:.3f}, {re['pred_interval'][1]:.3f})")
# Forest plot
fig, ax = plt.subplots(figsize=(10, 7))
y_positions = np.arange(K, 0, -1)
for i in range(K):
ci_lower = observed_effects[i] - 1.96 * np.sqrt(variances[i])
ci_upper = observed_effects[i] + 1.96 * np.sqrt(variances[i])
weight = fe['weights'][i]
ax.plot([ci_lower, ci_upper], [y_positions[i], y_positions[i]], 'b-', linewidth=1.5)
ax.plot(observed_effects[i], y_positions[i], 'bs', markersize=8,
label=f'Study {i+1}' if i < 5 else None)
# Pooled estimate
ax.plot(re['theta'], 0, 'rD', markersize=10, label='Random-effects pooled')
ax.plot([re['ci_95'][0], re['ci_95'][1]], [0, 0], 'r-', linewidth=2)
ax.axvline(x=0, color='gray', linestyle='-', linewidth=0.5)
ax.set_yticks(y_positions.tolist() + [0])
ax.set_yticklabels([f'Study {i+1}' for i in range(K)] + ['Pooled'])
ax.set_xlabel('Effect Size (Standardized Mean Difference)')
ax.set_title('Forest Plot')
ax.legend(loc='lower right')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('forest_plot.png', dpi=150)
plt.show()
# Funnel plot
funnel_plot(observed_effects, np.sqrt(variances))
Key Takeaways
Summary: Meta-Analysis
- Fixed-effect models assume a common true effect; random-effects models allow true effects to vary across studies.
- DerSimonian-Laird is the standard method for estimating between-study heterogeneity .
- Cochran's Q tests for heterogeneity; I² quantifies the percentage of variability due to heterogeneity.
- Publication bias can be assessed via funnel plots, Egger's test, and trim-and-fill methods.
- Meta-regression identifies study-level moderators that explain heterogeneity.
- Network meta-analysis enables indirect comparisons across multiple treatments using a connected evidence network.
- Always report prediction intervals alongside confidence intervals to convey the range of effects in future studies.