Meta-Analysis

Advanced Statistical Methods

Combining Evidence Across Studies for Stronger Conclusions

Meta-analysis statistically synthesizes results from multiple studies to produce a single summary estimate with greater precision. Fixed-effect and random-effects models account for heterogeneity across studies.

Clinical medicine — Combine trial results to establish definitive treatment guidelines
Education — Synthesize intervention studies to identify effective teaching strategies
Environmental policy — Aggregate epidemiological evidence for regulatory decision-making

Meta-analysis transforms a forest of individual studies into a clear, quantitative conclusion.

DfMeta-Analysis

A meta-analysis is a statistical procedure that combines results from multiple independent studies to produce a single summary estimate of an effect size. It quantifies the overall evidence, assesses consistency across studies, and identifies sources of heterogeneity.

"The goal of meta-analysis is not to produce a single number, but to understand the structure of evidence across studies." — Higgins & Green, Cochrane Handbook

Why Meta-Analysis?

Individual studies may be:

Underpowered: Too small to detect the true effect
Conflicting: Some studies find significance, others do not
Context-specific: Results vary by population, intervention, or setting

Meta-analysis addresses these issues by:

Increasing statistical power through pooled sample sizes
Quantifying heterogeneity across studies
Identifying moderators that explain variability
Providing a transparent, replicable summary of evidence

Effect Size Measures

Before pooling, each study's result must be converted to a common metric.

Standardized Mean Difference (Cohen's d)

d = \frac{\bar{X}_T - \bar{X}_R}{S_{\text{pooled}}}

where $S_{\text{pooled}} = \sqrt{\frac{(n_T - 1)S_T^2 + (n_R - 1)S_R^2}{n_T + n_R - 2}}$

Hedges' g (Bias-Corrected)

g = d \cdot \left(1 - \frac{3}{4(n_T + n_R) - 9}\right)

For binary outcomes:

Odds Ratio

\text{OR} = \frac{a \cdot d}{b \cdot c}

where $a, b, c, d$ are the cell frequencies in a 2×2 table:

	Event	No Event
Treatment	a	b
Control	c	d

Fixed-Effect Model

DfFixed-Effect Model

The fixed-effect model assumes all studies share a common true effect size $\theta$ . Variation across studies is due solely to sampling error. Study $i$ 's observed effect is:

Y_i = \theta + \varepsilon_i, \quad \varepsilon_i \sim N(0, v_i)

The pooled estimate is the inverse-variance weighted mean:

Fixed-Effect Pooled Estimate

\hat{\theta}_{FE} = \frac{\sum_{i=1}^{K} w_i Y_i}{\sum_{i=1}^{K} w_i}, \quad w_i = \frac{1}{v_i}

The variance of the pooled estimate:

\text{Var}(\hat{\theta}_{FE}) = \frac{1}{\sum_{i=1}^{K} w_i}

Cochran's Q

DfCochran's Q

Cochran's Q tests whether the observed effects are consistent with a common true effect:

Q = \sum_{i=1}^{K} w_i (Y_i - \hat{\theta}_{FE})^2

Under the fixed-effect null hypothesis, $Q \sim \chi^2_{K-1}$ .

Interpretation of Q

A significant Q ( $p < 0.10$ ) suggests heterogeneity beyond sampling error, indicating that a random-effects model may be more appropriate.

Random-Effects Model

DfRandom-Effects Model

The random-effects model assumes true effect sizes vary across studies:

Y_i = \mu + u_i + \varepsilon_i

where $u_i \sim N(0, \tau^2)$ is the study-specific deviation from the overall mean $\mu$ , and $\varepsilon_i \sim N(0, v_i)$ is the sampling error.

The total variance of study $i$ is:

Random-Effects Variance

\text{Var}(Y_i) = v_i + \tau^2 = \sigma_i^2

The pooled estimate:

Random-Effects Pooled Estimate

\hat{\theta}_{RE} = \frac{\sum_{i=1}^{K} w_i^* Y_i}{\sum_{i=1}^{K} w_i^*}, \quad w_i^* = \frac{1}{v_i + \hat{\tau}^2}

Key Difference

In random-effects models, studies with larger variance (smaller samples) receive relatively more weight compared to fixed-effect models, since the added $\tau^2$ term dilutes the precision advantage of large studies.

Estimating τ²

DerSimonian-Laird Method

DfDerSimonian-Laird Estimator

The most common method for estimating $\tau^2$ :

\hat{\tau}^2_{DL} = \frac{Q - (K - 1)}{\sum_{i=1}^{K} w_i - \frac{\sum_{i=1}^{K} w_i^2}{\sum_{i=1}^{K} w_i}}

If $Q \leq K - 1$ , then $\hat{\tau}^2_{DL} = 0$ .

Other Estimators

Method	Description	Property
REML	Restricted maximum likelihood	Less biased, often preferred
Paule-Mandel	Iterative matching of expected Q	Good small-sample properties
Hedges	Moment-based	Simple closed-form
PM	Profile likelihood	Better coverage in simulation

Heterogeneity Measures

I² Statistic

DfI² Statistic

I^2 = \frac{Q - (K - 1)}{Q} \times 100\%

$I^2$ represents the percentage of variability due to heterogeneity rather than sampling error. Values:

$I^2 = 0\%$ : No observed heterogeneity
$I^2 = 25\%$ : Low heterogeneity
$I^2 = 50\%$ : Moderate heterogeneity
$I^2 = 75\%$ : High heterogeneity

τ² and τ

$\tau^2$ is the between-study variance (absolute heterogeneity)
$\tau = \sqrt{\tau^2}$ is the standard deviation of true effects

Prediction Interval

A 95% prediction interval for a new study's effect:

\hat{\mu} \pm t_{K-2, 0.025} \cdot \sqrt{\hat{\tau}^2 + \text{Var}(\hat{\mu})}

This is wider than the confidence interval and quantifies the range of effects we might expect in a future study.

Publication Bias

DfPublication Bias

Publication bias occurs when the likelihood of a study being published depends on its results. Studies with statistically significant or positive findings are more likely to be published, inflating the meta-analytic estimate.

Funnel Plot

DfFunnel Plot

A scatter plot of effect sizes (x-axis) against a measure of precision (y-axis, typically standard error). In the absence of bias, studies should form a symmetric inverted funnel shape centered on the pooled estimate.

Egger's Test

DfEgger's Test

Egger's test regresses the standardized effect sizes on precision:

\frac{Y_i}{\sqrt{v_i}} = \beta_0 + \beta_1 \cdot \frac{1}{\sqrt{v_i}} + \varepsilon_i

A significant intercept ( $\beta_0 \neq 0$ ) at $p < 0.10$ suggests small-study effects (asymmetry).

Trim-and-Fill

DfTrim-and-Fill

The trim-and-fill method (Duval & Tweedie, 2000) imputes missing studies to restore funnel plot symmetry:

Estimate the number of missing studies ( $m$ ) by trimming extreme values
Impute $m$ studies on the sparse side of the funnel
Recompute the pooled estimate including imputed studies

Limitations

Trim-and-fill assumes asymmetry is solely due to publication bias. Asymmetry may also result from true heterogeneity, small-study effects, or methodological differences. Use multiple methods.

Moderator Analysis (Meta-Regression)

DfMeta-Regression

Meta-regression extends meta-analysis by modeling the relationship between study-level covariates and effect sizes:

Y_i = \beta_0 + \beta_1 X_{i1} + \cdots + \beta_p X_{ip} + u_i + \varepsilon_i

where $X_{ij}$ are study-level characteristics (e.g., dose, duration, population age).

The proportion of heterogeneity explained:

R^2 = \frac{\hat{\tau}^2_{\text{null}} - \hat{\tau}^2_{\text{model}}}{\hat{\tau}^2_{\text{null}}}

Network Meta-Analysis

DfNetwork Meta-Analysis

Network meta-analysis (NMA), also called mixed treatment comparisons, allows indirect comparisons of multiple treatments using a network of randomized trials. If Treatment A vs B and Treatment B vs C have been studied, NMA can estimate A vs C without direct head-to-head evidence.

Consistency Assumption

DfConsistency

The consistency assumption states that direct and indirect evidence agree:

\theta_{AC} = \theta_{AB} + \theta_{BC}

Inconsistency is assessed using node-splitting models or design-by-treatment interaction models.

SUCRA (Surface Under the Cumulative Ranking)

SUCRA

\text{SUCRA}_j = \frac{\sum_{k=1}^{K-1} \text{rank}_{jk} / (K-1)}{1} \times 100\%

SUCRA ranges from 0% (worst) to 100% (best), summarizing the probability that a treatment is ranked among the best.

Python Implementation

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# --- Fixed-effect meta-analysis (inverse-variance method) ---
def fixed_effect_meta-analysis(effects, variances):
    """
    Fixed-effect meta-analysis using inverse-variance weighting.
    
    Parameters:
        effects: array of effect sizes (one per study)
        variances: array of sampling variances
    Returns:
        dict with pooled estimate, CI, Q statistic, I²
    """
    effects = np.asarray(effects)
    variances = np.asarray(variances)
    weights = 1.0 / variances
    K = len(effects)
    
    theta_hat = np.sum(weights * effects) / np.sum(weights)
    var_theta = 1.0 / np.sum(weights)
    se_theta = np.sqrt(var_theta)
    
    # Cochran's Q
    Q = np.sum(weights * (effects - theta_hat)**2)
    df = K - 1
    p_Q = 1 - stats.chi2.cdf(Q, df)
    
    # I²
    I2 = max(0, (Q - df) / Q * 100) if Q > 0 else 0
    
    # 95% CI
    ci_lower = theta_hat - 1.96 * se_theta
    ci_upper = theta_hat + 1.96 * se_theta
    
    return {
        'theta': theta_hat, 'se': se_theta,
        'ci_95': (ci_lower, ci_upper),
        'Q': Q, 'df': df, 'p_Q': p_Q, 'I2': I2,
        'weights': weights / np.sum(weights) * 100
    }

# --- Random-effects meta-analysis (DerSimonian-Laird) ---
def random_effects_meta_analysis(effects, variances):
    """
    Random-effects meta-analysis using DerSimonian-Laird.
    """
    effects = np.asarray(effects)
    variances = np.asarray(variances)
    K = len(effects)
    
    # Fixed-effect for Q calculation
    w_fe = 1.0 / variances
    theta_fe = np.sum(w_fe * effects) / np.sum(w_fe)
    Q = np.sum(w_fe * (effects - theta_fe)**2)
    
    # DerSimonian-Laird tau²
    C = np.sum(w_fe) - np.sum(w_fe**2) / np.sum(w_fe)
    tau2 = max(0, (Q - (K - 1)) / C)
    
    # Random-effects weights
    w_re = 1.0 / (variances + tau2)
    theta_re = np.sum(w_re * effects) / np.sum(w_re)
    var_re = 1.0 / np.sum(w_re)
    se_re = np.sqrt(var_re)
    
    tau = np.sqrt(tau2)
    
    # Prediction interval
    t_crit = stats.t.ppf(0.975, K - 2)
    pred_lower = theta_re - t_crit * np.sqrt(tau2 + var_re)
    pred_upper = theta_re + t_crit * np.sqrt(tau2 + var_re)
    
    return {
        'theta': theta_re, 'se': se_re,
        'ci_95': (theta_re - 1.96*se_re, theta_re + 1.96*se_re),
        'tau2': tau2, 'tau': tau,
        'Q': Q, 'df': K-1, 'I2': max(0, (Q-(K-1))/Q*100) if Q > 0 else 0,
        'pred_interval': (pred_lower, pred_upper),
        'weights': w_re / np.sum(w_re) * 100
    }

# --- Funnel plot ---
def funnel_plot(effects, se, labels=None):
    """Create a funnel plot for publication bias assessment."""
    fig, ax = plt.subplots(figsize=(8, 6))
    ax.scatter(effects, se, s=50, c='steelblue', edgecolors='black', alpha=0.7)
    
    theta_pooled = np.average(effects, weights=1/np.array(se)**2)
    ax.axvline(x=theta_pooled, color='red', linestyle='--', label='Pooled estimate')
    
    # Pseudo 95% CI funnel
    se_range = np.linspace(0.01, max(se)*1.1, 100)
    for z in [1.96, -1.96]:
        ax.plot(theta_pooled + z * se_range, se_range, 'gray', linestyle=':', alpha=0.5)
    
    ax.set_xlabel('Effect Size')
    ax.set_ylabel('Standard Error')
    ax.set_title('Funnel Plot')
    ax.invert_yaxis()
    ax.legend()
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.savefig('funnel_plot.png', dpi=150)
    plt.show()

# --- Example: 10 studies on drug efficacy ---
np.random.seed(42)
K = 10
true_effect = 0.40  # True standardized mean difference
true_tau = 0.15

# Generate study effects
true_effects = np.random.normal(true_effect, true_tau, K)
sample_sizes = np.random.randint(30, 200, K)
variances = 2 / sample_sizes + true_tau**2 * np.random.uniform(0.5, 1.5, K)
observed_effects = np.random.normal(true_effects, np.sqrt(variances))

print("=== Fixed-Effect Meta-Analysis ===")
fe = fixed_effect_meta_analysis(observed_effects, variances)
print(f"Pooled effect: {fe['theta']:.3f} (SE: {fe['se']:.3f})")
print(f"95% CI: ({fe['ci_95'][0]:.3f}, {fe['ci_95'][1]:.3f})")
print(f"Cochran's Q: {fe['Q']:.2f}, df={fe['df']}, p={fe['p_Q']:.4f}")
print(f"I²: {fe['I2']:.1f}%")

print("\n=== Random-Effects Meta-Analysis (DerSimonian-Laird) ===")
re = random_effects_meta_analysis(observed_effects, variances)
print(f"Pooled effect: {re['theta']:.3f} (SE: {re['se']:.3f})")
print(f"95% CI: ({re['ci_95'][0]:.3f}, {re['ci_95'][1]:.3f})")
print(f"τ²: {re['tau2']:.4f}, τ: {re['tau']:.3f}")
print(f"I²: {re['I2']:.1f}%")
print(f"Prediction interval: ({re['pred_interval'][0]:.3f}, {re['pred_interval'][1]:.3f})")

# Forest plot
fig, ax = plt.subplots(figsize=(10, 7))
y_positions = np.arange(K, 0, -1)
for i in range(K):
    ci_lower = observed_effects[i] - 1.96 * np.sqrt(variances[i])
    ci_upper = observed_effects[i] + 1.96 * np.sqrt(variances[i])
    weight = fe['weights'][i]
    
    ax.plot([ci_lower, ci_upper], [y_positions[i], y_positions[i]], 'b-', linewidth=1.5)
    ax.plot(observed_effects[i], y_positions[i], 'bs', markersize=8,
            label=f'Study {i+1}' if i < 5 else None)

# Pooled estimate
ax.plot(re['theta'], 0, 'rD', markersize=10, label='Random-effects pooled')
ax.plot([re['ci_95'][0], re['ci_95'][1]], [0, 0], 'r-', linewidth=2)
ax.axvline(x=0, color='gray', linestyle='-', linewidth=0.5)

ax.set_yticks(y_positions.tolist() + [0])
ax.set_yticklabels([f'Study {i+1}' for i in range(K)] + ['Pooled'])
ax.set_xlabel('Effect Size (Standardized Mean Difference)')
ax.set_title('Forest Plot')
ax.legend(loc='lower right')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('forest_plot.png', dpi=150)
plt.show()

# Funnel plot
funnel_plot(observed_effects, np.sqrt(variances))

Key Takeaways

Summary: Meta-Analysis

Fixed-effect models assume a common true effect; random-effects models allow true effects to vary across studies.
DerSimonian-Laird is the standard method for estimating between-study heterogeneity $\tau^2$ .
Cochran's Q tests for heterogeneity; I² quantifies the percentage of variability due to heterogeneity.
Publication bias can be assessed via funnel plots, Egger's test, and trim-and-fill methods.
Meta-regression identifies study-level moderators that explain heterogeneity.
Network meta-analysis enables indirect comparisons across multiple treatments using a connected evidence network.
Always report prediction intervals alongside confidence intervals to convey the range of effects in future studies.

Meta-Analysis

Meta-Analysis

Combining Evidence Across Studies for Stronger Conclusions

DfMeta-Analysis

Why Meta-Analysis?

Effect Size Measures

Standardized Mean Difference (Cohen's d)

Hedges' g (Bias-Corrected)

Odds Ratio

Fixed-Effect Model

DfFixed-Effect Model

Fixed-Effect Pooled Estimate

Cochran's Q

DfCochran's Q

Random-Effects Model

DfRandom-Effects Model

Random-Effects Variance

Random-Effects Pooled Estimate

Estimating τ²

DerSimonian-Laird Method

DfDerSimonian-Laird Estimator

Other Estimators

Heterogeneity Measures

I² Statistic

DfI² Statistic

τ² and τ

Prediction Interval

Prediction Interval

Publication Bias

DfPublication Bias

Funnel Plot

DfFunnel Plot

Egger's Test

DfEgger's Test

Trim-and-Fill

DfTrim-and-Fill

Moderator Analysis (Meta-Regression)

DfMeta-Regression

Network Meta-Analysis

DfNetwork Meta-Analysis

Consistency Assumption

DfConsistency

SUCRA (Surface Under the Cumulative Ranking)

SUCRA

Python Implementation

Key Takeaways

Summary: Meta-Analysis

Next Steps

Premium Content

Need Expert Statistics Help?