Confidence Intervals for Two Samples — Comparing Groups

Foundations of Statistics

Comparing Groups with Precision

Two-sample intervals estimate the difference between population parameters, providing the foundation for comparing treatments, groups, or conditions. They answer the practical question: how different are these groups really?

A/B Testing — Quantifying the true difference in website conversion rates
Clinical Trials — Estimating treatment versus control group differences
Social Science — Measuring effect sizes in observational studies

The difference between groups is often more important than the groups themselves.

Core Concepts

Two-sample confidence intervals estimate the difference between two population parameters (means or proportions). They are the foundation for comparing groups.

DfTwo-Sample Confidence Interval

A $(1-\alpha)\times 100\\%$ confidence interval for $\\mu_1 - \\mu_2$ (or $p_1 - p_2$ ) quantifies the uncertainty in the difference between two population parameters.

CI for Difference of Means (Independent, Equal Variance)

(\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2, n_1+n_2-2} \cdot s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}

Here,

$\bar{x}_1, \bar{x}_2$ =Sample means
$s_p$ =Pooled standard deviation
$n_1, n_2$ =Sample sizes
$t_{\alpha/2, n_1+n_2-2}$ =Critical value with pooled df

Pooled Standard Deviation

s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}

Here,

$s_1, s_2$ =Sample standard deviations
$n_1, n_2$ =Sample sizes

Welch's t-Interval (Unequal Variance)

Welch's Approximate Degrees of Freedom

\nu = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}

Here,

$s_1^2, s_2^2$ =Sample variances
$n_1, n_2$ =Sample sizes
$\nu$ =Effective degrees of freedom (not necessarily integer)

When to Use Welch's t

Welch's t-interval is the safer default because it does not assume equal population variances. The equal-variance pooled t-interval is only appropriate when there is strong evidence that $\sigma_1^2 = \sigma_2^2$ .

Derivation: The Two-Sample Pivot

ThTwo-Sample t-Pivot (Equal Variance)

Let $X_1, \\ldots, X_{n_1} \\sim \mathcal{N}(\\mu_1, \sigma^2)$ and $Y_1, \\ldots, Y_{n_2} \\sim \mathcal{N}(\\mu_2, \sigma^2)$ be independent samples. Then

T = \frac{(\bar{X} - \bar{Y}) - (\mu_1 - \mu_2)}{S_p\sqrt{1/n_1 + 1/n_2}} \sim t_{n_1+n_2-2}

where $S_p^2 = \frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1+n_2-2}$ is the pooled variance.

Proof sketch: The numerator $\bar{X} - \bar{Y} - (\\mu_1 - \\mu_2)$ is normal with mean 0 and variance $\sigma^2(1/n_1 + 1/n_2)$ . Standardize: $Z = \frac{\bar{X} - \bar{Y} - (\\mu_1 - \\mu_2)}{\sigma\\sqrt{1/n_1 + 1/n_2}} \\sim \mathcal{N}(0,1)$ . By Cochran's theorem, $(n_1-1)S_1^2/\sigma^2 + (n_2-1)S_2^2/\sigma^2 \\sim \\chi^2_{n_1+n_2-2}$ , independent of $\bar{X}$ and $\bar{Y}$ . Therefore $T = Z/\\sqrt{Q/(n_1+n_2-2)}$ where $Q \\sim \\chi^2_{n_1+n_2-2}$ , giving $t_{n_1+n_2-2}$ by definition.

CI for Difference of Proportions

(\hat{p}_1 - \hat{p}_2) \pm z_{\alpha/2} \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}

Here,

$\hat{p}_1, \hat{p}_2$ =Sample proportions
$n_1, n_2$ =Sample sizes

Wilson Score Interval

The standard Wald interval above can have poor coverage when $\hat{p}$ is near 0 or 1. The Wilson score interval provides better finite-sample coverage:

\frac{\hat{p}_1(1-\hat{p}_1)/n_1 + \hat{p}_2(1-\hat{p}_2)/n_2 + z_{\alpha/2}^2/(2n_1 + 2n_2)}{(1 + z_{\alpha/2}^2/(n_1 + n_2))^2}

with analogous adjustments to the center.

Worked Example: Clinical Trial

Two drugs are compared for blood pressure reduction. Drug A: $n_1 = 40$ , $\bar{x}_1 = 8.2$ mmHg, $s_1 = 3.1$ . Drug B: $n_2 = 35$ , $\bar{x}_2 = 6.5$ mmHg, $s_2 = 2.8$ . Construct a 95% CI for $\\mu_1 - \\mu_2$ .

Step 1: Check if equal variance is reasonable: $s_1^2/s_2^2 = 9.61/7.84 = 1.23$ . The ratio is less than 2, so pooled t is reasonable.

Step 2: Compute pooled standard deviation:

s_p = \sqrt{\frac{39 \times 9.61 + 34 \times 7.84}{73}} = \sqrt{\frac{374.79 + 266.56}{73}} = \sqrt{8.786} = 2.964

Step 3: Compute standard error:

SE = 2.964\sqrt{\frac{1}{40} + \frac{1}{35}} = 2.964\sqrt{0.025 + 0.02857} = 2.964 \times 0.2314 = 0.686

Step 4: Critical value: $t_{0.025, 73} \\approx 1.993$ (interpolating from t-table).

Step 5: Construct CI:

(8.2 - 6.5) \pm 1.993 \times 0.686 = 1.7 \pm 1.367 = [0.333, \ 3.067]

Interpretation

Since the 95% CI for $\\mu_1 - \\mu_2$ does not contain 0, we conclude Drug A provides significantly greater blood pressure reduction than Drug B at the $\alpha = 0.05$ level. The effect size is estimated at 1.7 mmHg with margin of error 1.37 mmHg.

Key Takeaways

Summary: Confidence Intervals for Two Samples

Estimates the difference between two population parameters
Means (equal variance): pooled t-interval with $df = n_1 + n_2 - 2$
Means (unequal variance): Welch's t-interval with approximate df (safer default)
Proportions: $z$ -interval based on difference of sample proportions; prefer Wilson score for small $n$
If the CI includes 0, the difference is not statistically significant at that level
The pooled t-interval is not robust to unequal variances; always check or use Welch

Confidence Intervals for Two Samples — Comparing Groups

Confidence Intervals for Two Samples — Comparing Groups

Comparing Groups with Precision

Core Concepts

DfTwo-Sample Confidence Interval

CI for Difference of Means (Independent, Equal Variance)

Pooled Standard Deviation

Welch's t-Interval (Unequal Variance)

Welch's Approximate Degrees of Freedom

Derivation: The Two-Sample Pivot

ThTwo-Sample t-Pivot (Equal Variance)

CI for Difference of Proportions

CI for Difference of Proportions

Worked Example: Clinical Trial

Key Takeaways

Summary: Confidence Intervals for Two Samples

Premium Content

Need Expert Statistics Help?