Systematic Review Methodology

Advanced Statistical Methods

Comprehensive, Reproducible Evidence Synthesis

Systematic reviews follow PRISMA guidelines to identify, evaluate, and synthesize all relevant evidence on a question using transparent, reproducible methodology. Risk of bias assessment ensures study quality.

Healthcare guideline development — Form the evidence base for clinical practice recommendations
Policy evaluation — Assess the total evidence for government program effectiveness
Technology assessment — Compare interventions systematically for informed purchasing decisions

Systematic reviews replace narrative cherry-picking with comprehensive, reproducible evidence synthesis.

DfSystematic Review

A systematic review is a rigorous, transparent, and reproducible method for identifying, evaluating, and synthesizing all relevant evidence to answer a specific research question. Unlike narrative reviews, systematic reviews use explicit, pre-specified methods to minimize bias.

"A systematic review attempts to identify, appraise, and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a given research question." — Cochrane Handbook

Systematic Review vs Meta-Analysis

Aspect	Systematic Review	Meta-Analysis
Scope	Qualitative synthesis of evidence	Quantitative statistical pooling
Output	Narrative summary with assessment	Pooled effect estimate with CI
When used	Always for systematic reviews	Only when studies are comparable enough
Heterogeneity	Addressed narratively	Quantified (I², τ²)
PRISMA	Required	Required (as part of SR)

Key Point

Every meta-analysis should be embedded within a systematic review, but not every systematic review includes a meta-analysis. When studies are too heterogeneous, methodologically diverse, or use incompatible outcome measures, synthesis without meta-analysis (SWiM) is appropriate.

PRISMA Guidelines

DfPRISMA

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) is a 27-item checklist and flow diagram that standardizes the reporting of systematic reviews. PRISMA 2020 (Page et al., 2021) updated the original 2009 guidelines.

PRISMA 2020 Flow Diagram

Architecture Diagram

Identification
├── Records from databases (n = ___)
├── Records from other sources (n = ___)
└── Duplicates removed (n = ___)

Screening
├── Records screened (n = ___)
├── Records excluded (n = ___)
└── Reports sought for retrieval (n = ___)

Eligibility
├── Reports assessed for eligibility (n = ___)
├── Reports excluded (n = ___)
  ├── Reason 1 (n = ___)
  ├── Reason 2 (n = ___)
  └── Reason 3 (n = ___)

Included
├── Studies included in review (n = ___)
└── Studies included in meta-analysis (n = ___)

Search Strategy

DfSearch Strategy

A search strategy is a systematic, reproducible method for identifying all relevant studies. It combines subject terms (MeSH, Emtree) with free-text keywords using Boolean operators.

PICO Framework

DfPICO

The PICO framework structures the research question:

P (Population): Who is being studied?
I (Intervention): What is the treatment or exposure?
C (Comparison): What is the comparator?
O (Outcome): What outcomes are measured?

Example Search Strategy

Architecture Diagram

# Database: MEDLINE via PubMed
# Population: Adults with type 2 diabetes
# Intervention: SGLT2 inhibitors
# Comparison: Placebo or other antidiabetics

("diabetes mellitus, type 2"[MeSH] OR "type 2 diabetes"[tiab])
AND
("sodium-glucose transporter 2 inhibitors"[MeSH] OR "SGLT2 inhibitor*"[tiab] 
 OR "empagliflozin"[tiab] OR "dapagliflozin"[tiab] OR "canagliflozin"[tiab])
AND
("cardiovascular diseases"[MeSH] OR "heart failure"[MeSH] OR "MACE"[tiab] 
 OR "cardiovascular outcome*"[tiab])
NOT
("type 1 diabetes"[tiab] OR "gestational"[tiab])

Search Sensitivity vs Specificity

A highly sensitive search (broad terms, no filters) ensures no relevant studies are missed but produces many irrelevant records. A highly specific search is efficient but risks missing studies. For systematic reviews, maximize sensitivity — screening time is preferred over missed studies.

Inclusion and Exclusion Criteria

DfEligibility Criteria

Eligibility criteria define which studies are included in the review. They must be pre-specified in a protocol and applied consistently.

Common Eligibility Domains

Domain	Inclusion	Exclusion
Study design	RCTs, quasi-experimental	Case reports, editorials
Population	Adults ≥18 years	Pediatric, pregnant
Intervention	SGLT2 inhibitors (any dose)	Combination therapy only
Comparator	Placebo, active control	No comparator
Outcome	MACE, all-cause mortality	Surrogate endpoints only
Timeframe	Published 2010–2025	Pre-2010
Language	English	Non-English (if justified)

Risk of Bias Assessment

DfRisk of Bias

Risk of bias (RoB) refers to the likelihood that a study's design, conduct, or analysis introduced systematic error, leading to over- or under-estimation of the true effect.

Cochrane Risk of Bias Tool (RoB 2)

The Cochrane RoB 2 tool assesses five domains:

Domain	Key Question
D1: Randomization process	Was allocation truly random? Was it concealed?
D2: Deviations from intended interventions	Were participants aware of allocation?
D3: Missing outcome data	Was attrition balanced and explained?
D4: Measurement of the outcome	Was the outcome measure valid and assessed blindly?
D5: Selection of reported result	Was the analysis pre-specified?

Each domain rated as: Low risk, Some concerns, or High risk.

Overall Judgment

If any domain is rated "High risk," the overall study is "High risk." If any domain has "Some concerns" and no domain is "High risk," the overall rating is "Some concerns." Only studies with "Low risk" across all domains receive an overall low risk of bias rating.

Data Extraction

DfData Extraction

Data extraction is the systematic process of recording study characteristics and results from included studies into a standardized form. Two independent reviewers typically extract data, with discrepancies resolved by consensus or a third reviewer.

Standard Data Fields

Category	Fields
Study	Author, year, country, design, sample size
Population	Age, sex, BMI, diabetes duration, HbA1c
Intervention	Drug, dose, duration
Comparator	Drug, dose, duration
Outcomes	Effect estimate (OR, HR, MD), 95% CI, n per group
Quality	RoB rating, GRADE certainty

GRADE Quality Assessment

DfGRADE

GRADE (Grading of Recommendations Assessment, Development and Evaluation) is a systematic approach for rating the certainty of evidence and strength of recommendations. It rates evidence as high, moderate, low, or very low certainty.

GRADE Domains

Domain	Effect on Certainty
Risk of bias	↓ Downgrade if serious limitations
Inconsistency	↓ Downgrade if I² > 50% or unexplained heterogeneity
Indirectness	↓ Downgrade if population, intervention, or outcome differs
Imprecision	↓ Downgrade if wide CI crosses clinical decision threshold
Publication bias	↓ Downgrade if funnel plot asymmetry or Egger's p < 0.10

Upgrade factors:

↑ Large effect (RR > 2 or < 0.5)
↑ Dose-response gradient
↑ All confounders would reduce the effect

Starting Certainty

RCTs start at high certainty. Observational studies start at low certainty. Each domain can move the rating down (or up for observational studies with large effects).

Synthesis Without Meta-Analysis (SWiM)

DfSWiM

Synthesis without meta-analysis (SWiM) provides guidance for systematically reviewing evidence when quantitative pooling is inappropriate. It uses structured, transparent narrative synthesis with tabular and graphical summaries.

When to Use SWiM

Studies use different outcome measures or scales
Studies are too heterogeneous to pool meaningfully
Few studies (K < 3) are available
Methodological diversity prevents valid pooling

SWiM Methods

Method	Description
Vote counting	Count studies directionally favorable/unfavorable
Harvest plots	Bar charts weighted by study quality
Blobbograms	Modified forest plots without pooling
Narrative synthesis	Structured textual summary by subgroups
Tabular summaries	Effect estimates, CIs, and quality ratings in tables

Python Implementation

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# --- PRISMA flow diagram data ---
prisma_data = {
    'Stage': ['Identification', 'Screening', 'Eligibility', 'Included'],
    'Records': [4523, 2891, 312, 45],
    'Excluded': [1632, 2579, 267, 0]
}

print("=== PRISMA Flow ===")
for i, stage in enumerate(prisma_data['Stage']):
    print(f"{stage}: {prisma_data['Records'][i]} records")
    if prisma_data['Excluded'][i] > 0:
        print(f"  Excluded: {prisma_data['Excluded'][i]}")

# --- Risk of Bias Assessment ---
studies = [
    {'Study': 'Smith 2020', 'D1': 'Low', 'D2': 'Low', 'D3': 'Some concerns', 
     'D4': 'Low', 'D5': 'Low', 'Overall': 'Some concerns'},
    {'Study': 'Jones 2021', 'D1': 'Low', 'D2': 'Low', 'D3': 'Low', 
     'D4': 'Low', 'D5': 'Low', 'Overall': 'Low'},
    {'Study': 'Lee 2022', 'D1': 'High', 'D2': 'Some concerns', 'D3': 'Low', 
     'D4': 'Some concerns', 'D5': 'Low', 'Overall': 'High risk'},
    {'Study': 'Chen 2023', 'D1': 'Low', 'D2': 'Low', 'D3': 'Low', 
     'D4': 'Some concerns', 'D5': 'Low', 'Overall': 'Some concerns'},
    {'Study': 'Wang 2024', 'D1': 'Low', 'D2': 'Low', 'D3': 'Low', 
     'D4': 'Low', 'D5': 'Low', 'Overall': 'Low'},
]

df_rob = pd.DataFrame(studies)
print("\n=== Risk of Bias Summary ===")
print(df_rob.to_string(index=False))

# Traffic light plot
rob_matrix = df_rob[['D1', 'D2', 'D3', 'D4', 'D5']].values
color_map = {'Low': '#2ecc71', 'Some concerns': '#f39c12', 'High': '#e74c3c',
             'High risk': '#e74c3c'}

fig, ax = plt.subplots(figsize=(10, 5))
for i in range(len(rob_matrix)):
    for j in range(5):
        color = color_map.get(rob_matrix[i, j], '#95a5a6')
        ax.add_patch(plt.Rectangle((j, len(rob_matrix) - i - 1), 1, 1, 
                     facecolor=color, edgecolor='white', linewidth=2))

ax.set_xlim(0, 5)
ax.set_ylim(0, len(rob_matrix))
ax.set_xticks([0.5, 1.5, 2.5, 3.5, 4.5])
ax.set_xticklabels(['Randomization', 'Deviations', 'Missing Data', 
                     'Measurement', 'Reporting'])
ax.set_yticks([0.5 + i for i in range(len(rob_matrix))])
ax.set_yticklabels(df_rob['Study'].tolist()[::-1])
ax.set_title('Risk of Bias Traffic Light Plot')
plt.tight_layout()
plt.savefig('rob_traffic_light.png', dpi=150)
plt.show()

# --- GRADE Evidence Profile ---
grade_data = {
    'Outcome': ['MACE (3-point)', 'All-cause mortality', 'Heart failure hospitalization'],
    'Studies': [5, 4, 6],
    'Participants': [45000, 38000, 52000],
    'Risk of bias': ['Serious (-1)', 'Not serious', 'Serious (-1)'],
    'Inconsistency': ['Not serious', 'Serious (-1)', 'Not serious'],
    'Indirectness': ['Not serious', 'Not serious', 'Not serious'],
    'Imprecision': ['Not serious', 'Serious (-1)', 'Not serious'],
    'Publication bias': ['Undetected', 'Undetected', 'Undetected'],
    'Starting level': ['High', 'High', 'High'],
    'Final certainty': ['Moderate', 'Low', 'Moderate'],
    'Effect estimate': ['HR 0.86 (0.80-0.93)', 'HR 0.92 (0.84-1.01)', 'HR 0.72 (0.64-0.82)']
}

df_grade = pd.DataFrame(grade_data)
print("\n=== GRADE Evidence Profile ===")
print(df_grade.to_string(index=False))

# --- Study selection funnel ---
selection_data = {
    'Phase': ['Database search', 'Duplicate removal', 'Title/abstract screen',
              'Full-text review', 'Data extraction', 'Quality assessment', 'Final synthesis'],
    'Records': [4523, 2891, 892, 312, 45, 45, 45]
}

fig, ax = plt.subplots(figsize=(10, 6))
phases = selection_data['Phase']
counts = selection_data['Records']
colors = plt.cm.Blues(np.linspace(0.3, 0.9, len(phases)))
ax.barh(phases[::-1], counts[::-1], color=colors[::-1], edgecolor='black')
for i, (count, phase) in enumerate(zip(counts[::-1], phases[::-1])):
    ax.text(count + 50, i, str(count), va='center', fontsize=10)
ax.set_xlabel('Number of Records')
ax.set_title('Study Selection Funnel')
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.savefig('study_selection.png', dpi=150)
plt.show()

# --- Inclusion/exclusion summary ---
print("\n=== Exclusion Reasons (Full-Text) ===")
exclusion_reasons = {
    'Wrong population': 87,
    'Wrong intervention': 62,
    'Wrong outcome': 45,
    'Wrong study design': 38,
    'Duplicate data': 21,
    'Insufficient data': 14
}
for reason, count in sorted(exclusion_reasons.items(), key=lambda x: -x[1]):
    print(f"  {reason}: {count} studies")

Key Takeaways

Summary: Systematic Review Methodology

Systematic reviews use explicit, pre-specified methods to minimize bias — they are not narrative summaries.
PRISMA 2020 provides the standard reporting framework with a 27-item checklist and flow diagram.
Search strategies should maximize sensitivity using PICO-framed Boolean queries across multiple databases.
Risk of bias assessment (Cochrane RoB 2) evaluates five domains: randomization, deviations, missing data, measurement, and reporting.
GRADE rates evidence certainty from high to very low, starting from study design and adjusting for bias, inconsistency, indirectness, imprecision, and publication bias.
SWiM provides structured methods for narrative synthesis when meta-analysis is inappropriate.
Data extraction should be performed independently by two reviewers with pre-specified data fields.

Systematic Review Methodology

Systematic Review Methodology

Comprehensive, Reproducible Evidence Synthesis

DfSystematic Review

Systematic Review vs Meta-Analysis

PRISMA Guidelines

DfPRISMA

PRISMA 2020 Flow Diagram

Search Strategy

DfSearch Strategy

PICO Framework

DfPICO

Example Search Strategy

Inclusion and Exclusion Criteria

DfEligibility Criteria

Common Eligibility Domains

Risk of Bias Assessment

DfRisk of Bias

Cochrane Risk of Bias Tool (RoB 2)

Data Extraction

DfData Extraction

Standard Data Fields

GRADE Quality Assessment

DfGRADE

GRADE Domains

Synthesis Without Meta-Analysis (SWiM)

DfSWiM

When to Use SWiM

SWiM Methods

Python Implementation

Key Takeaways

Summary: Systematic Review Methodology

Next Steps

Premium Content

Need Expert Statistics Help?