Odds Ratios

Regression Analysis

Interpreting Associations in Binary Outcomes

Odds ratios quantify the strength of association between exposures and binary outcomes. They are the primary effect measure in logistic regression and case-control studies, providing intuitive multiplicative comparisons.

Epidemiology — Measure risk factor associations in disease studies
Clinical Trials — Report treatment effects on binary endpoints
Social Sciences — Quantify how factors like education affect binary decisions

An odds ratio of 2 means the odds double — simple interpretation with profound implications.

The odds ratio (OR) measures the association between a binary predictor and a binary outcome. It is the ratio of two odds.

Odds Ratio

OR = \frac{p_1/(1-p_1)}{p_2/(1-p_2)}

Here,

$OR$ =Odds ratio
$p_1$ =Probability of outcome in group 1
$p_2$ =Probability of outcome in group 2


import numpy as np

import pandas as pd

from scipy import stats



# 2×2 contingency table

# Smoking vs Heart Disease

data = np.array([[80, 120],   # Smokers: disease, no disease

                 [30, 270]])  # Non-smokers: disease, no disease



smoker_odds = data[0,0] / data[0,1]

nonsmoker_odds = data[1,0] / data[1,1]

OR = smoker_odds / nonsmoker_odds



print("Smoking and Heart Disease:")

print(f"  Smokers: {data[0,0]} disease, {data[0,1]} no disease -> odds = {smoker_odds:.3f}")

print(f"  Non-smokers: {data[1,0]} disease, {data[1,1]} no disease -> odds = {nonsmoker_odds:.3f}")

print(f"  Odds Ratio = {OR:.3f}")

print(f"  Smokers have {OR:.1f}× the odds of heart disease vs non-smokers")



# 95% CI for OR (log-method)

log_OR = np.log(OR)

SE_log_OR = np.sqrt(sum(1/x for x in data.flatten()))

CI_lower = np.exp(log_OR - 1.96*SE_log_OR)

CI_upper = np.exp(log_OR + 1.96*SE_log_OR)

print(f"  95% CI: ({CI_lower:.3f}, {CI_upper:.3f})")



# Fisher's exact test

oddsratio_fisher, p_fisher = stats.fisher_exact(data)

print(f"  Fisher's exact p-value: {p_fisher:.6f}")



# OR from logistic regression

import statsmodels.api as sm

np.random.seed(42)

n = 500

smoking = np.random.binomial(1, 0.4, n)

heart_disease = np.random.binomial(1, 0.1 + 0.2*smoking)



X = sm.add_constant(smoking)

logit_model = sm.Logit(heart_disease, X).fit(disp=False)

or_logit = np.exp(logit_model.params['x1'])

ci_logit = np.exp(logit_model.conf_int().loc['x1'])

print(f"\nOR from logistic regression: {or_logit:.3f}")

print(f"95% CI: ({ci_logit[0]:.3f}, {ci_logit[1]:.3f})")



# OR vs Risk Ratio

p1 = data[0,0] / data[0].sum()

p2 = data[1,0] / data[1].sum()

RR = p1 / p2

print(f"\nOR = {OR:.3f}, Risk Ratio (RR) = {RR:.3f}")

print("OR overestimates effect when outcome is common (>10%)")

print("Use RR for cohort studies, OR for case-control studies")

OR vs RR

OR overestimates the effect when the outcome is common (>10%). Use RR for cohort studies and RCTs; use OR for case-control studies.

Key Takeaways

Summary: Odds Ratios

OR = 1: no association; OR greater than 1: positive association; OR < 1: negative association
OR ˜ RR only when the outcome is rare (<10%)
Log(OR) from logistic regression gives the coefficient
95% CI not including 1 means statistically significant association
Case-control studies use OR; cohort/RCT studies can use either RR or OR

Odds Ratios — Understanding and Interpreting ORs

Odds Ratios

Interpreting Associations in Binary Outcomes

Odds Ratio

Key Takeaways

Summary: Odds Ratios

Premium Content

Need Expert Statistics Help?