πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Systematic Review Methodology

Advanced Statistical MethodsEvidence Synthesis🟒 Free Lesson

Advertisement

Systematic Review Methodology

Advanced Statistical Methods

Comprehensive, Reproducible Evidence Synthesis

Systematic reviews follow PRISMA guidelines to identify, evaluate, and synthesize all relevant evidence on a question using transparent, reproducible methodology. Risk of bias assessment ensures study quality.

  • Healthcare guideline development β€” Form the evidence base for clinical practice recommendations
  • Policy evaluation β€” Assess the total evidence for government program effectiveness
  • Technology assessment β€” Compare interventions systematically for informed purchasing decisions

Systematic reviews replace narrative cherry-picking with comprehensive, reproducible evidence synthesis.


DfSystematic Review

A systematic review is a rigorous, transparent, and reproducible method for identifying, evaluating, and synthesizing all relevant evidence to answer a specific research question. Unlike narrative reviews, systematic reviews use explicit, pre-specified methods to minimize bias.

"A systematic review attempts to identify, appraise, and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a given research question." β€” Cochrane Handbook


Systematic Review vs Meta-Analysis

AspectSystematic ReviewMeta-Analysis
ScopeQualitative synthesis of evidenceQuantitative statistical pooling
OutputNarrative summary with assessmentPooled effect estimate with CI
When usedAlways for systematic reviewsOnly when studies are comparable enough
HeterogeneityAddressed narrativelyQuantified (IΒ², τ²)
PRISMARequiredRequired (as part of SR)

Key Point

Every meta-analysis should be embedded within a systematic review, but not every systematic review includes a meta-analysis. When studies are too heterogeneous, methodologically diverse, or use incompatible outcome measures, synthesis without meta-analysis (SWiM) is appropriate.


PRISMA Guidelines

DfPRISMA

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) is a 27-item checklist and flow diagram that standardizes the reporting of systematic reviews. PRISMA 2020 (Page et al., 2021) updated the original 2009 guidelines.

PRISMA 2020 Flow Diagram

Architecture Diagram
Identification
β”œβ”€β”€ Records from databases (n = ___)
β”œβ”€β”€ Records from other sources (n = ___)
└── Duplicates removed (n = ___)

Screening
β”œβ”€β”€ Records screened (n = ___)
β”œβ”€β”€ Records excluded (n = ___)
└── Reports sought for retrieval (n = ___)

Eligibility
β”œβ”€β”€ Reports assessed for eligibility (n = ___)
β”œβ”€β”€ Reports excluded (n = ___)
  β”œβ”€β”€ Reason 1 (n = ___)
  β”œβ”€β”€ Reason 2 (n = ___)
  └── Reason 3 (n = ___)

Included
β”œβ”€β”€ Studies included in review (n = ___)
└── Studies included in meta-analysis (n = ___)

Search Strategy

DfSearch Strategy

A search strategy is a systematic, reproducible method for identifying all relevant studies. It combines subject terms (MeSH, Emtree) with free-text keywords using Boolean operators.

PICO Framework

DfPICO

The PICO framework structures the research question:

  • P (Population): Who is being studied?
  • I (Intervention): What is the treatment or exposure?
  • C (Comparison): What is the comparator?
  • O (Outcome): What outcomes are measured?

Example Search Strategy

Architecture Diagram
# Database: MEDLINE via PubMed
# Population: Adults with type 2 diabetes
# Intervention: SGLT2 inhibitors
# Comparison: Placebo or other antidiabetics

("diabetes mellitus, type 2"[MeSH] OR "type 2 diabetes"[tiab])
AND
("sodium-glucose transporter 2 inhibitors"[MeSH] OR "SGLT2 inhibitor*"[tiab] 
 OR "empagliflozin"[tiab] OR "dapagliflozin"[tiab] OR "canagliflozin"[tiab])
AND
("cardiovascular diseases"[MeSH] OR "heart failure"[MeSH] OR "MACE"[tiab] 
 OR "cardiovascular outcome*"[tiab])
NOT
("type 1 diabetes"[tiab] OR "gestational"[tiab])

Search Sensitivity vs Specificity

A highly sensitive search (broad terms, no filters) ensures no relevant studies are missed but produces many irrelevant records. A highly specific search is efficient but risks missing studies. For systematic reviews, maximize sensitivity β€” screening time is preferred over missed studies.


Inclusion and Exclusion Criteria

DfEligibility Criteria

Eligibility criteria define which studies are included in the review. They must be pre-specified in a protocol and applied consistently.

Common Eligibility Domains

DomainInclusionExclusion
Study designRCTs, quasi-experimentalCase reports, editorials
PopulationAdults β‰₯18 yearsPediatric, pregnant
InterventionSGLT2 inhibitors (any dose)Combination therapy only
ComparatorPlacebo, active controlNo comparator
OutcomeMACE, all-cause mortalitySurrogate endpoints only
TimeframePublished 2010–2025Pre-2010
LanguageEnglishNon-English (if justified)

Risk of Bias Assessment

DfRisk of Bias

Risk of bias (RoB) refers to the likelihood that a study's design, conduct, or analysis introduced systematic error, leading to over- or under-estimation of the true effect.

Cochrane Risk of Bias Tool (RoB 2)

The Cochrane RoB 2 tool assesses five domains:

DomainKey Question
D1: Randomization processWas allocation truly random? Was it concealed?
D2: Deviations from intended interventionsWere participants aware of allocation?
D3: Missing outcome dataWas attrition balanced and explained?
D4: Measurement of the outcomeWas the outcome measure valid and assessed blindly?
D5: Selection of reported resultWas the analysis pre-specified?

Each domain rated as: Low risk, Some concerns, or High risk.

Overall Judgment

If any domain is rated "High risk," the overall study is "High risk." If any domain has "Some concerns" and no domain is "High risk," the overall rating is "Some concerns." Only studies with "Low risk" across all domains receive an overall low risk of bias rating.


Data Extraction

DfData Extraction

Data extraction is the systematic process of recording study characteristics and results from included studies into a standardized form. Two independent reviewers typically extract data, with discrepancies resolved by consensus or a third reviewer.

Standard Data Fields

CategoryFields
StudyAuthor, year, country, design, sample size
PopulationAge, sex, BMI, diabetes duration, HbA1c
InterventionDrug, dose, duration
ComparatorDrug, dose, duration
OutcomesEffect estimate (OR, HR, MD), 95% CI, n per group
QualityRoB rating, GRADE certainty

GRADE Quality Assessment

DfGRADE

GRADE (Grading of Recommendations Assessment, Development and Evaluation) is a systematic approach for rating the certainty of evidence and strength of recommendations. It rates evidence as high, moderate, low, or very low certainty.

GRADE Domains

DomainEffect on Certainty
Risk of bias↓ Downgrade if serious limitations
Inconsistency↓ Downgrade if IΒ² > 50% or unexplained heterogeneity
Indirectness↓ Downgrade if population, intervention, or outcome differs
Imprecision↓ Downgrade if wide CI crosses clinical decision threshold
Publication bias↓ Downgrade if funnel plot asymmetry or Egger's p < 0.10

Upgrade factors:

  • ↑ Large effect (RR > 2 or < 0.5)
  • ↑ Dose-response gradient
  • ↑ All confounders would reduce the effect

Starting Certainty

RCTs start at high certainty. Observational studies start at low certainty. Each domain can move the rating down (or up for observational studies with large effects).


Synthesis Without Meta-Analysis (SWiM)

DfSWiM

Synthesis without meta-analysis (SWiM) provides guidance for systematically reviewing evidence when quantitative pooling is inappropriate. It uses structured, transparent narrative synthesis with tabular and graphical summaries.

When to Use SWiM

  • Studies use different outcome measures or scales
  • Studies are too heterogeneous to pool meaningfully
  • Few studies (K < 3) are available
  • Methodological diversity prevents valid pooling

SWiM Methods

MethodDescription
Vote countingCount studies directionally favorable/unfavorable
Harvest plotsBar charts weighted by study quality
BlobbogramsModified forest plots without pooling
Narrative synthesisStructured textual summary by subgroups
Tabular summariesEffect estimates, CIs, and quality ratings in tables

Python Implementation

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# --- PRISMA flow diagram data ---
prisma_data = {
    'Stage': ['Identification', 'Screening', 'Eligibility', 'Included'],
    'Records': [4523, 2891, 312, 45],
    'Excluded': [1632, 2579, 267, 0]
}

print("=== PRISMA Flow ===")
for i, stage in enumerate(prisma_data['Stage']):
    print(f"{stage}: {prisma_data['Records'][i]} records")
    if prisma_data['Excluded'][i] > 0:
        print(f"  Excluded: {prisma_data['Excluded'][i]}")

# --- Risk of Bias Assessment ---
studies = [
    {'Study': 'Smith 2020', 'D1': 'Low', 'D2': 'Low', 'D3': 'Some concerns', 
     'D4': 'Low', 'D5': 'Low', 'Overall': 'Some concerns'},
    {'Study': 'Jones 2021', 'D1': 'Low', 'D2': 'Low', 'D3': 'Low', 
     'D4': 'Low', 'D5': 'Low', 'Overall': 'Low'},
    {'Study': 'Lee 2022', 'D1': 'High', 'D2': 'Some concerns', 'D3': 'Low', 
     'D4': 'Some concerns', 'D5': 'Low', 'Overall': 'High risk'},
    {'Study': 'Chen 2023', 'D1': 'Low', 'D2': 'Low', 'D3': 'Low', 
     'D4': 'Some concerns', 'D5': 'Low', 'Overall': 'Some concerns'},
    {'Study': 'Wang 2024', 'D1': 'Low', 'D2': 'Low', 'D3': 'Low', 
     'D4': 'Low', 'D5': 'Low', 'Overall': 'Low'},
]

df_rob = pd.DataFrame(studies)
print("\n=== Risk of Bias Summary ===")
print(df_rob.to_string(index=False))

# Traffic light plot
rob_matrix = df_rob[['D1', 'D2', 'D3', 'D4', 'D5']].values
color_map = {'Low': '#2ecc71', 'Some concerns': '#f39c12', 'High': '#e74c3c',
             'High risk': '#e74c3c'}

fig, ax = plt.subplots(figsize=(10, 5))
for i in range(len(rob_matrix)):
    for j in range(5):
        color = color_map.get(rob_matrix[i, j], '#95a5a6')
        ax.add_patch(plt.Rectangle((j, len(rob_matrix) - i - 1), 1, 1, 
                     facecolor=color, edgecolor='white', linewidth=2))

ax.set_xlim(0, 5)
ax.set_ylim(0, len(rob_matrix))
ax.set_xticks([0.5, 1.5, 2.5, 3.5, 4.5])
ax.set_xticklabels(['Randomization', 'Deviations', 'Missing Data', 
                     'Measurement', 'Reporting'])
ax.set_yticks([0.5 + i for i in range(len(rob_matrix))])
ax.set_yticklabels(df_rob['Study'].tolist()[::-1])
ax.set_title('Risk of Bias Traffic Light Plot')
plt.tight_layout()
plt.savefig('rob_traffic_light.png', dpi=150)
plt.show()

# --- GRADE Evidence Profile ---
grade_data = {
    'Outcome': ['MACE (3-point)', 'All-cause mortality', 'Heart failure hospitalization'],
    'Studies': [5, 4, 6],
    'Participants': [45000, 38000, 52000],
    'Risk of bias': ['Serious (-1)', 'Not serious', 'Serious (-1)'],
    'Inconsistency': ['Not serious', 'Serious (-1)', 'Not serious'],
    'Indirectness': ['Not serious', 'Not serious', 'Not serious'],
    'Imprecision': ['Not serious', 'Serious (-1)', 'Not serious'],
    'Publication bias': ['Undetected', 'Undetected', 'Undetected'],
    'Starting level': ['High', 'High', 'High'],
    'Final certainty': ['Moderate', 'Low', 'Moderate'],
    'Effect estimate': ['HR 0.86 (0.80-0.93)', 'HR 0.92 (0.84-1.01)', 'HR 0.72 (0.64-0.82)']
}

df_grade = pd.DataFrame(grade_data)
print("\n=== GRADE Evidence Profile ===")
print(df_grade.to_string(index=False))

# --- Study selection funnel ---
selection_data = {
    'Phase': ['Database search', 'Duplicate removal', 'Title/abstract screen',
              'Full-text review', 'Data extraction', 'Quality assessment', 'Final synthesis'],
    'Records': [4523, 2891, 892, 312, 45, 45, 45]
}

fig, ax = plt.subplots(figsize=(10, 6))
phases = selection_data['Phase']
counts = selection_data['Records']
colors = plt.cm.Blues(np.linspace(0.3, 0.9, len(phases)))
ax.barh(phases[::-1], counts[::-1], color=colors[::-1], edgecolor='black')
for i, (count, phase) in enumerate(zip(counts[::-1], phases[::-1])):
    ax.text(count + 50, i, str(count), va='center', fontsize=10)
ax.set_xlabel('Number of Records')
ax.set_title('Study Selection Funnel')
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.savefig('study_selection.png', dpi=150)
plt.show()

# --- Inclusion/exclusion summary ---
print("\n=== Exclusion Reasons (Full-Text) ===")
exclusion_reasons = {
    'Wrong population': 87,
    'Wrong intervention': 62,
    'Wrong outcome': 45,
    'Wrong study design': 38,
    'Duplicate data': 21,
    'Insufficient data': 14
}
for reason, count in sorted(exclusion_reasons.items(), key=lambda x: -x[1]):
    print(f"  {reason}: {count} studies")

Key Takeaways

Summary: Systematic Review Methodology

  1. Systematic reviews use explicit, pre-specified methods to minimize bias β€” they are not narrative summaries.
  2. PRISMA 2020 provides the standard reporting framework with a 27-item checklist and flow diagram.
  3. Search strategies should maximize sensitivity using PICO-framed Boolean queries across multiple databases.
  4. Risk of bias assessment (Cochrane RoB 2) evaluates five domains: randomization, deviations, missing data, measurement, and reporting.
  5. GRADE rates evidence certainty from high to very low, starting from study design and adjusting for bias, inconsistency, indirectness, imprecision, and publication bias.
  6. SWiM provides structured methods for narrative synthesis when meta-analysis is inappropriate.
  7. Data extraction should be performed independently by two reviewers with pre-specified data fields.

Next Steps

⭐

Premium Content

Systematic Review Methodology

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement