πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Pre-registration of Studies

Advanced Statistical MethodsResearch Methodology🟒 Free Lesson

Advertisement

Pre-registration of Studies

Advanced Statistical Methods

Locking In Hypotheses Before Seeing Results

Pre-registration documents research plans, hypotheses, and analysis strategies before data collection, preventing post-hoc hypothesizing and HARKing. It separates confirmatory from exploratory research.

  • Clinical trials β€” Prevent outcome switching by registering primary endpoints in advance
  • Social sciences β€” Distinguish planned analyses from exploratory fishing expeditions
  • Drug development β€” Provide regulatory assurance that trial results are not selectively reported

Pre-registration makes the boundary between prediction and postdiction crystal clear.


Pre-registration is the practice of documenting a study's hypotheses, design, analysis plan, and any deviations from standard protocols in a time-stamped, publicly accessible registry before data collection begins.

DfPre-registration

A pre-registration is a time-stamped document that specifies, prior to data collection: (1) research hypotheses, (2) study design, (3) sample size justification, (4) primary and secondary outcome measures, (5) exclusion criteria, and (6) planned statistical analyses.


What to Pre-register

Essential Pre-registration Components

  1. Hypotheses β€” clearly stated primary and secondary hypotheses
  2. Study design β€” between-subjects, within-subjects, longitudinal, etc.
  3. Sample size justification β€” a priori power analysis or information-based sizing
  4. Outcome measures β€” primary, secondary, and exploratory endpoints
  5. Exclusion criteria β€” rules for excluding data (pre-specified, not post hoc)
  6. Analysis plan β€” specific statistical tests, models, and software
  7. Inference criteria β€” significance threshold, one- vs two-tailed, correction methods

Exploratory vs. Confirmatory Analysis

DfConfirmatory Data Analysis (CDA)

CDA tests a priori hypotheses using pre-specified analyses. The Type I error rate is controlled at Ξ±, and the analysis is deductive: theory β†’ hypothesis β†’ data β†’ conclusion.

DfExploratory Data Analysis (EDA)

EDA generates new hypotheses from data using flexible, post hoc analyses. It is inductive: data β†’ patterns β†’ hypothesis. Results require independent confirmation.

The CDA–EDA Boundary

  • CDA is valid only if analyses were truly specified before seeing the data
  • EDA becomes problematic when presented as confirmatory (HARKing)
  • Pre-registration creates a clear, auditable boundary between the two
  • Deviations from the pre-registered plan must be transparently reported

Mathematical Framework: Decision Theory for Pre-registration

Expected Value of Pre-registration

EV(PR)=P(true)β‹…V(true)⏟CredibilityΒ gainβˆ’C(time)βˆ’C(flexibility)⏟Costs\text{EV}(\text{PR}) = \underbrace{P(\text{true}) \cdot V(\text{true})}_{\text{Credibility gain}} - \underbrace{C(\text{time}) - C(\text{flexibility})}_{\text{Costs}}

Here,

  • P(true)P(true)=Probability that the study yields true results
  • V(true)V(true)=Value of credible, reproducible findings
  • C(time)C(time)=Time cost of writing the pre-registration
  • C(flexibility)C(flexibility)=Cost of reduced analytical flexibility

The key insight is that pre-registration reduces flexibility but increases credibility. The net value depends on the field's replication norms and the study's stakes.


OSF Pre-registration

The Open Science Framework (OSF) is the most widely used pre-registration platform.

DfOSF Pre-registration Structure

  1. Summary β€” brief description of the study
  2. Hypotheses β€” numbered, specific predictions
  3. Design β€” factorial, between/within, blocking variables
  4. Sampling Plan β€” data collection stopping rule, sample size justification
  5. Variables β€” IVs, DVs, covariates, manipulation checks
  6. Analysis Plan β€” specific tests, models, software, Ξ±-level
  7. Other β€” deviations, unanticipated events, exploratory analyses

Power Analysis for Pre-registration

A Priori Power Analysis

n=(z1βˆ’Ξ±/2+z1βˆ’Ξ²Ξ΄)2β‹…2Οƒ2n = \left(\frac{z_{1-\alpha/2} + z_{1-\beta}}{\delta}\right)^2 \cdot 2\sigma^2

Here,

  • nn=Required sample size per group
  • Ξ±Ξ±=Significance level (typically 0.05)
  • Ξ²Ξ²=Type II error rate (power = 1 βˆ’ Ξ²)
  • δδ=Minimum detectable effect size
  • σσ=Population standard deviation

Power Analysis Best Practices

  • Conduct power analysis before pre-registration, not after
  • Specify minimum effect size of practical significance
  • Use simulation-based power for complex designs (Bayesian, multilevel)
  • Report sensitivity power: "What effect size can we detect with n = X at 80% power?"

Python Implementation: Pre-registration Template Generator

OSF Pre-registration Template

import json
from datetime import datetime

class PreregistrationTemplate:
    def __init__(self, title, authors, research_question):
        self.title = title
        self.authors = authors
        self.research_question = research_question
        self.hypotheses = []
        self.design = {}
        self.sampling_plan = {}
        self.variables = {"IV": [], "DV": [], "covariates": []}
        self.analysis_plan = []
        self.exclusions = []
        self.timestamp = datetime.now().isoformat()

    def add_hypothesis(self, number, statement, direction="two-sided"):
        self.hypotheses.append({
            "H_number": number,
            "statement": statement,
            "direction": direction
        })

    def set_design(self, design_type, factors=None, within_subjects=False):
        self.design = {
            "type": design_type,
            "factors": factors or [],
            "within_subjects": within_subjects
        }

    def set_sampling_plan(self, n_per_group, power, effect_size, alpha=0.05,
                          stopping_rule="fixed"):
        self.sampling_plan = {
            "n_per_group": n_per_group,
            "power": power,
            "effect_size": effect_size,
            "alpha": alpha,
            "stopping_rule": stopping_rule,
            "total_n": n_per_group * (2 if "between" in self.design.get("type", "") else 1)
        }

    def add_variable(self, var_type, name, measure, coding=None):
        entry = {"name": name, "measure": measure}
        if coding:
            entry["coding"] = coding
        self.variables[var_type].append(entry)

    def add_analysis(self, test_name, variables, model=None, software="R"):
        self.analysis_plan.append({
            "test": test_name,
            "variables": variables,
            "model": model,
            "software": software
        })

    def add_exclusion_criterion(self, criterion):
        self.exclusions.append(criterion)

    def to_osf_format(self):
        return {
            "title": self.title,
            "authors": self.authors,
            "registration_type": "OSF Standard",
            "timestamp": self.timestamp,
            "sections": {
                "1_summary": self.research_question,
                "2_hypotheses": self.hypotheses,
                "3_design": self.design,
                "4_sampling_plan": self.sampling_plan,
                "5_variables": self.variables,
                "6_analysis_plan": self.analysis_plan,
                "7_exclusions": self.exclusions,
                "8_other": "No additional information at this time."
            }
        }

    def export_json(self, filename="preregistration.json"):
        data = self.to_osf_format()
        with open(filename, 'w') as f:
            json.dump(data, f, indent=2)
        print(f"Pre-registration exported to {filename}")
        return data

# Example: Create a pre-registration for a two-sample t-test
prereg = PreregistrationTemplate(
    title="Effect of Sleep Deprivation on Cognitive Performance",
    authors=["Smith, J.", "Doe, A."],
    research_question="Does 24-hour sleep deprivation impair working memory performance?"
)

prereg.add_hypothesis(
    number=1,
    statement="Sleep-degraded participants will show lower accuracy on the n-back task "
              "than well-rested controls.",
    direction="one-sided"
)

prereg.set_design(
    design_type="between-subjects",
    factors=["sleep_condition"],
    within_subjects=False
)

prereg.set_sampling_plan(
    n_per_group=50,
    power=0.90,
    effect_size=0.5,  # Cohen's d
    alpha=0.05
)

prereg.add_variable("IV", "sleep_condition", "Manipulated (deprived vs. control)")
prereg.add_variable("DV", "nback_accuracy", "Proportion correct on 2-back task")
prereg.add_variable("covariates", "baseline_cognition", "Pre-study n-back score")

prereg.add_analysis(
    test_name="Welch's t-test",
    variables=["nback_accuracy ~ sleep_condition"],
    software="R (version 4.3.1)"
)

prereg.add_exclusion_criterion("Participants who fail manipulation check (subjective sleepiness < 6/10 in control group)")
prereg.add_exclusion_criterion("Participants with >20% missing trial data")

# Export
data = prereg.export_json("sleep_deprivation_prereg.json")
print(json.dumps(data, indent=2))

Threats to Pre-registration

Common Threats to Pre-registration Integrity

  1. Vague hypotheses β€” pre-registering overly broad predictions that can match any outcome
  2. Outcome switching β€” changing primary outcomes after seeing results
  3. Analytical flexibility β€” pre-registering multiple analyses and reporting only significant ones
  4. Leakage β€” sharing pre-registration privately to bias reviewers
  5. File drawer of pre-registrations β€” never publishing pre-registered studies that fail
  6. Post hoc justification β€” claiming deviations were "necessary" without pre-specification

Pre-registration Platforms

Major Pre-registration Platforms

  • OSF (osf.io) β€” free, open, supports many formats (OSF Standard, AsPredicted, Registered Report)
  • AsPredicted.org β€” quick, 9-question template for simple studies
  • ClinicalTrials.gov β€” mandatory for FDA-regulated clinical trials
  • ISRCTN β€” international clinical trial registry
  • EGAP β€” Experiments in Governance and Politics (pre-registration with mandatory replication)
  • AEA RCT Registry β€” American Economic Association randomized controlled trials

Evaluating Pre-registration Quality

Pre-registration Completeness Index

CPR=βˆ‘i=1k1(itemΒ iΒ specified)kC_{\text{PR}} = \frac{\sum_{i=1}^{k} \mathbb{1}(\text{item } i \text{ specified})}{k}

Here,

  • CPRC_PR=Completeness index (0–1)
  • kk=Total number of required items
  • πŸ™(β‹…)πŸ™(Β·)=Indicator function (1 if specified, 0 otherwise)

Higher completeness is associated with more rigorous research practices, though quality of specification matters more than mere presence of items.


Key Takeaways

Summary: Pre-registration of Studies

  1. Pre-registration separates confirmatory from exploratory research with a time-stamped record
  2. What to preregister: hypotheses, design, sample size, outcomes, exclusions, and analysis plan
  3. OSF and AsPredicted are the most widely used pre-registration platforms
  4. Power analysis must be conducted before pre-registration, not after
  5. Threats include vague hypotheses, outcome switching, and analytical flexibility
  6. Pre-registration does not prevent EDA β€” it clarifies which analyses are confirmatory vs. exploratory
  7. Registered Reports build on pre-registration by providing in-principle acceptance before results
⭐

Premium Content

Pre-registration of Studies

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement