Confidence Intervals for Variance — Chi-Square Interval
Foundations of Statistics
Quantifying Uncertainty in Variability
Variance intervals use the chi-square distribution's asymmetry, producing unequal bounds around the point estimate. Understanding this asymmetry is crucial for interpreting precision in variability estimates.
- Manufacturing — Assessing process consistency and setting tolerance specifications
- Finance — Estimating volatility ranges for risk management
- Quality Engineering — Monitoring measurement system variability
Variance intervals reveal that precision itself is uncertain.
Core Concepts
Confidence intervals for variance use the chi-square distribution. Unlike intervals for the mean, these intervals are asymmetric — the lower and upper bounds are not equidistant from the point estimate.
DfChi-Square Confidence Interval for Variance
A confidence interval for the population variance is based on the pivotal quantity .
Confidence Interval for Variance
Here,
- =Sample size
- =Sample variance
- =Upper critical value
- =Lower critical value
Asymmetric Intervals
The chi-square interval is not symmetric about . The lower tail critical value is always closer to 0 than the upper tail is to , making the interval wider on the right.
Confidence Interval for Standard Deviation
CI for Standard Deviation
Here,
- =Sample standard deviation
- =Sample size
Derivation from the Sampling Distribution
ThChi-Square Pivot for Variance
Let be i.i.d. . Define the sample variance . Then the pivotal quantity
has a chi-square distribution with degrees of freedom, independent of .
Proof sketch: Standardize each observation: . The sum of squared standard normals is . Decompose using Cochran's theorem: . The first term on the right is and the second is . By independence (since is sufficient for and is sufficient for in the normal family), the first term is .
Why the Chi-Square Distribution Appears
The chi-square distribution arises as the distribution of a sum of squared independent standard normals. The key insight is that the sample variance, when properly scaled, is such a sum — but the degrees of freedom are reduced by 1 because is estimated from the data rather than known.
Worked Example: Quality Control
A quality control engineer measures the diameter of 25 ball bearings. The sample variance is . Construct a 95% CI for .
Step 1: Identify parameters: , , , .
Step 2: Find chi-square critical values:
Step 3: Compute the interval:
Step 4: For standard deviation, take square roots:
Interpretation
We are 95% confident that the true variance lies in and the true standard deviation lies in . Note the asymmetry: the upper bound is the lower bound for the variance, not symmetric about .
Sensitivity to Non-Normality
ThRobustness Failure of Chi-Square CI
The chi-square confidence interval for is not robust to departures from normality. If the underlying distribution has excess kurtosis , the actual coverage probability can be substantially lower than the nominal .
Proof sketch: For a non-normal population with kurtosis , the statistic no longer follows exactly . A Cornish-Fisher expansion shows the leading correction is proportional to . For heavy-tailed distributions (e.g., with ), the true coverage can be 90% when 95% is nominal.
Practical Consequence
Unlike confidence intervals for the mean (which are robust via the CLT), the variance interval requires normality. With skewed or heavy-tailed data, use bootstrap methods instead.
Python Implementation: Bootstrap Comparison
import numpy as np
from scipy import stats
np.random.seed(42)
n = 25
sigma_true = 1.0
data = np.random.normal(loc=0.0, scale=sigma_true, size=n)
s2 = np.var(data, ddof=1)
# Chi-square CI (parametric)
chi2_low = stats.chi2.ppf(0.975, df=n-1)
chi2_high = stats.chi2.ppf(0.025, df=n-1)
ci_parametric = [(n-1)*s2 / chi2_low, (n-1)*s2 / chi2_high]
print(f"Parametric CI for σ²: [{ci_parametric[0]:.4f}, {ci_parametric[1]:.4f}]")
# Bootstrap CI (non-parametric)
B = 10000
boot_vars = np.array([np.var(np.random.choice(data, size=n, replace=True), ddof=1)
for _ in range(B)])
ci_bootstrap = np.percentile(boot_vars, [2.5, 97.5])
print(f"Bootstrap CI for σ²: [{ci_bootstrap[0]:.4f}, {ci_bootstrap[1]:.4f}]")
# Compare coverage (repeat 1000 times)
coverage_param = 0
coverage_boot = 0
M = 1000
for _ in range(M):
sample = np.random.normal(0, sigma_true, n)
sv = np.var(sample, ddof=1)
chi2_lo = stats.chi2.ppf(0.975, n-1)
chi2_hi = stats.chi2.ppf(0.025, n-1)
lo_p, hi_p = (n-1)*sv/chi2_lo, (n-1)*sv/chi2_hi
if lo_p <= sigma_true**2 <= hi_p:
coverage_param += 1
boot_v = np.array([np.var(np.random.choice(sample, n, replace=True), ddof=1)
for _ in range(1000)])
lo_b, hi_b = np.percentile(boot_v, [2.5, 97.5])
if lo_b <= sigma_true**2 <= hi_b:
coverage_boot += 1
print(f"Parametric coverage: {coverage_param/M:.3f}")
print(f"Bootstrap coverage: {coverage_boot/M:.3f}")
Key Takeaways
Summary: Confidence Intervals for Variance
- Based on
- Asymmetric interval: lower and upper bounds are not equidistant from
- Requires the population to be normally distributed (sensitive to non-normality)
- For standard deviation, take square roots of the variance interval endpoints
- Not robust to kurtosis: heavy tails cause under-coverage; prefer bootstrap for non-normal data
- Critical values satisfy — this is why the interval is asymmetric