Confidence Intervals for Variance — Chi-Square Interval

Foundations of Statistics

Quantifying Uncertainty in Variability

Variance intervals use the chi-square distribution's asymmetry, producing unequal bounds around the point estimate. Understanding this asymmetry is crucial for interpreting precision in variability estimates.

Manufacturing — Assessing process consistency and setting tolerance specifications
Finance — Estimating volatility ranges for risk management
Quality Engineering — Monitoring measurement system variability

Variance intervals reveal that precision itself is uncertain.

Core Concepts

Confidence intervals for variance use the chi-square distribution. Unlike intervals for the mean, these intervals are asymmetric — the lower and upper bounds are not equidistant from the point estimate.

DfChi-Square Confidence Interval for Variance

A $(1-\alpha)\times 100\\%$ confidence interval for the population variance $\sigma^2$ is based on the pivotal quantity $(n-1)s^2/\sigma^2 \\sim \\chi^2_{n-1}$ .

Confidence Interval for Variance

\left[\frac{(n-1)s^2}{\chi^2_{\alpha/2, n-1}}, \quad \frac{(n-1)s^2}{\chi^2_{1-\alpha/2, n-1}}\right]

Here,

$n$ =Sample size
$s^2$ =Sample variance
$\chi^2_{\alpha/2, n-1}$ =Upper critical value
$\chi^2_{1-\alpha/2, n-1}$ =Lower critical value

Asymmetric Intervals

The chi-square interval is not symmetric about $s^2$ . The lower tail critical value is always closer to 0 than the upper tail is to $2(n-1)$ , making the interval wider on the right.

Confidence Interval for Standard Deviation

CI for Standard Deviation

\left[\sqrt{\frac{(n-1)s^2}{\chi^2_{\alpha/2, n-1}}}, \quad \sqrt{\frac{(n-1)s^2}{\chi^2_{1-\alpha/2, n-1}}}\right]

Here,

$s$ =Sample standard deviation
$n$ =Sample size

Derivation from the Sampling Distribution

ThChi-Square Pivot for Variance

Let $X_1, X_2, \\ldots, X_n$ be i.i.d. $\mathcal{N}(\\mu, \sigma^2)$ . Define the sample variance $S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2$ . Then the pivotal quantity

Q = \frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}

has a chi-square distribution with $n-1$ degrees of freedom, independent of $\\mu$ .

Proof sketch: Standardize each observation: $Z_i = (X_i - \\mu)/\sigma \\sim \mathcal{N}(0,1)$ . The sum of squared standard normals is $\sum Z_i^2 \\sim \\chi^2_n$ . Decompose using Cochran's theorem: $\sum Z_i^2 = \sum(X_i - \bar{X})^2/\sigma^2 + (\bar{X} - \\mu)^2 n/\sigma^2$ . The first term on the right is $(n-1)S^2/\sigma^2$ and the second is $\\chi^2_1$ . By independence (since $\bar{X}$ is sufficient for $\\mu$ and $S^2$ is sufficient for $\sigma^2$ in the normal family), the first term is $\\chi^2_{n-1}$ .

Why the Chi-Square Distribution Appears

The chi-square distribution arises as the distribution of a sum of squared independent standard normals. The key insight is that the sample variance, when properly scaled, is such a sum — but the degrees of freedom are reduced by 1 because $\bar{X}$ is estimated from the data rather than known.

Worked Example: Quality Control

A quality control engineer measures the diameter of 25 ball bearings. The sample variance is $s^2 = 0.0036\\ \text{mm}^2$ . Construct a 95% CI for $\sigma^2$ .

Step 1: Identify parameters: $n = 25$ , $s^2 = 0.0036$ , $\alpha = 0.05$ , $df = 24$ .

Step 2: Find chi-square critical values:

\chi^2_{0.025, 24} = 39.364, \quad \chi^2_{0.975, 24} = 12.401

Step 3: Compute the interval:

\left[\frac{24 \times 0.0036}{39.364}, \quad \frac{24 \times 0.0036}{12.401}\right] = [0.00219, \quad 0.00697]

Step 4: For standard deviation, take square roots:

[\sqrt{0.00219}, \quad \sqrt{0.00697}] = [0.0468, \quad 0.0835]\ \text{mm}

Interpretation

We are 95% confident that the true variance lies in $[0.00219, 0.00697]$ and the true standard deviation lies in $[0.0468, 0.0835]$ . Note the asymmetry: the upper bound is $3.18\times$ the lower bound for the variance, not symmetric about $s^2 = 0.0036$ .

Sensitivity to Non-Normality

ThRobustness Failure of Chi-Square CI

The chi-square confidence interval for $\sigma^2$ is not robust to departures from normality. If the underlying distribution has excess kurtosis $\\kappa_4 > 0$ , the actual coverage probability can be substantially lower than the nominal $1 - \alpha$ .

Proof sketch: For a non-normal population with kurtosis $\\kappa_4$ , the statistic $(n-1)S^2/\sigma^2$ no longer follows exactly $\\chi^2_{n-1}$ . A Cornish-Fisher expansion shows the leading correction is proportional to $\\kappa_4 / n$ . For heavy-tailed distributions (e.g., $t_5$ with $\\kappa_4 = 6$ ), the true coverage can be 90% when 95% is nominal.

Practical Consequence

Unlike confidence intervals for the mean (which are robust via the CLT), the variance interval requires normality. With skewed or heavy-tailed data, use bootstrap methods instead.

Python Implementation: Bootstrap Comparison

import numpy as np
from scipy import stats

np.random.seed(42)
n = 25
sigma_true = 1.0
data = np.random.normal(loc=0.0, scale=sigma_true, size=n)
s2 = np.var(data, ddof=1)

# Chi-square CI (parametric)
chi2_low = stats.chi2.ppf(0.975, df=n-1)
chi2_high = stats.chi2.ppf(0.025, df=n-1)
ci_parametric = [(n-1)*s2 / chi2_low, (n-1)*s2 / chi2_high]
print(f"Parametric CI for σ²: [{ci_parametric[0]:.4f}, {ci_parametric[1]:.4f}]")

# Bootstrap CI (non-parametric)
B = 10000
boot_vars = np.array([np.var(np.random.choice(data, size=n, replace=True), ddof=1)
                      for _ in range(B)])
ci_bootstrap = np.percentile(boot_vars, [2.5, 97.5])
print(f"Bootstrap CI for σ²:  [{ci_bootstrap[0]:.4f}, {ci_bootstrap[1]:.4f}]")

# Compare coverage (repeat 1000 times)
coverage_param = 0
coverage_boot = 0
M = 1000
for _ in range(M):
    sample = np.random.normal(0, sigma_true, n)
    sv = np.var(sample, ddof=1)
    chi2_lo = stats.chi2.ppf(0.975, n-1)
    chi2_hi = stats.chi2.ppf(0.025, n-1)
    lo_p, hi_p = (n-1)*sv/chi2_lo, (n-1)*sv/chi2_hi
    if lo_p <= sigma_true**2 <= hi_p:
        coverage_param += 1
    boot_v = np.array([np.var(np.random.choice(sample, n, replace=True), ddof=1)
                       for _ in range(1000)])
    lo_b, hi_b = np.percentile(boot_v, [2.5, 97.5])
    if lo_b <= sigma_true**2 <= hi_b:
        coverage_boot += 1
print(f"Parametric coverage: {coverage_param/M:.3f}")
print(f"Bootstrap coverage:  {coverage_boot/M:.3f}")

Key Takeaways

Summary: Confidence Intervals for Variance

Based on $(n-1)s^2/\sigma^2 \\sim \\chi^2_{n-1}$
Asymmetric interval: lower and upper bounds are not equidistant from $s^2$
Requires the population to be normally distributed (sensitive to non-normality)
For standard deviation, take square roots of the variance interval endpoints
Not robust to kurtosis: heavy tails cause under-coverage; prefer bootstrap for non-normal data
Critical values satisfy $\\chi^2_{\alpha/2,\\nu} + \\chi^2_{1-\alpha/2,\\nu} \neq \\nu$ — this is why the interval is asymmetric

Confidence Intervals for Variance — Chi-Square Interval

Confidence Intervals for Variance — Chi-Square Interval

Quantifying Uncertainty in Variability

Core Concepts

DfChi-Square Confidence Interval for Variance

Confidence Interval for Variance

Confidence Interval for Standard Deviation

CI for Standard Deviation

Derivation from the Sampling Distribution

ThChi-Square Pivot for Variance

Worked Example: Quality Control

Sensitivity to Non-Normality

ThRobustness Failure of Chi-Square CI

Python Implementation: Bootstrap Comparison

Key Takeaways

Summary: Confidence Intervals for Variance

Premium Content

Need Expert Statistics Help?