Bootstrap Methods — Resampling for Inference

Statistics

Computer-Intensive Inference Without Distributional Assumptions

Bootstrapping estimates the sampling distribution of any statistic by resampling with replacement from the data. It provides standard errors, confidence intervals, and hypothesis tests when theoretical formulas are unavailable or unreliable.

Finance — Estimate VaR confidence intervals for complex portfolio distributions
Ecology — Build confidence intervals for species diversity indices
Machine Learning — Assess variability of feature importance measures

Let the data generate its own reference distribution through the power of resampling.

Bootstrapping is a resampling method that estimates the sampling distribution of a statistic by sampling with replacement from the observed data. It provides standard errors and confidence intervals without distributional assumptions.

DfBootstrap

A computer-intensive method that approximates the sampling distribution of a statistic by repeatedly resampling (with replacement) from the observed data and recomputing the statistic for each resample.

Bootstrap Principle

Key Insight

The empirical distribution of the sample approximates the true population distribution. Therefore, the distribution of statistics computed from bootstrap samples approximates the true sampling distribution.

Algorithm

| Step | Action |

|------|--------|

| 1 | Draw a bootstrap sample $X^*_1, \ldots, X^*_n$ by sampling with replacement from the original data |

| 2 | Compute the statistic $\hat{\theta}^*$ from the bootstrap sample |

| 3 | Repeat steps 1-2 B times (typically B = 1,000 - 10,000) |

| 4 | Use the distribution of $\hat{\theta}^*_1, \ldots, \hat{\theta}^*_B$ for inference |

Bootstrap Standard Error

SE_{boot} = \sqrt{\frac{1}{B-1}\sum_{b=1}^{B}(\hat{\theta}^*_b - \bar{\theta}^*)^2}

Here,

$\hat{\theta}^*_b$ =Statistic from bootstrap sample b
$\bar{\theta}^*$ =Mean of bootstrap statistics
$B$ =Number of bootstrap resamples

Bootstrap Confidence Intervals

Percentile Method

Percentile CI

[\hat{\theta}^*_{\alpha/2}, \hat{\theta}^*_{1-\alpha/2}]

Here,

$\hat{\theta}^*_{p}$ =p-th percentile of bootstrap distribution

BCa (Bias-Corrected and Accelerated)

BCa CI

[\hat{\theta}^*_{\alpha_1}, \hat{\theta}^*_{\alpha_2}]

Here,

$\alpha_1 = \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + z_{\alpha}}{1 - \hat{a}(\hat{z}_0 + z_{\alpha})}\right)$ =Adjusted lower percentile
$\hat{z}_0$ =Bias correction factor
$\hat{a}$ =Acceleration factor

BCa vs Percentile

The BCa interval adjusts for bias and skewness in the bootstrap distribution. It is generally preferred over the simple percentile method.

Types of Bootstrap

| Type | Resampling Unit | When to Use |

|------|----------------|-------------|

| Nonparametric | Individual observations | Default; no distributional assumptions |

| Parametric | From fitted distribution | When distribution is known |

| Block | Blocks of observations | Time series data |

| Wild | Residuals with sign changes | Heteroscedastic data |

Bootstrap Hypothesis Testing

Bootstrap p-value

p = \frac{\#\{|\hat{\theta}^*_b - \hat{\theta}_{obs}| \geq |\hat{\theta}_{obs} - \theta_0|\} + 1}{B + 1}

Here,

$\hat{\theta}_{obs}$ =Observed statistic
$\theta_0$ =Null hypothesis value
$B$ =Number of bootstrap samples

Subsampling

A related method that samples without replacement with a smaller sample size $m < n$ .

Subsampling vs Bootstrap

Subsampling does not require the data to be exchangeable and works for some problems where the bootstrap fails (e.g., unit roots). However, it requires choosing $m$ .

Python Implementation


import numpy as np

import matplotlib.pyplot as plt



np.random.seed(42)



# Original data

n = 200

true_mean = 5.0

true_std = 2.0

data = np.random.normal(true_mean, true_std, n)



# Observed statistic

obs_mean = np.mean(data)

print(f"Observed mean: {obs_mean:.3f}")



# Bootstrap

B = 5000

boot_means = np.zeros(B)

for b in range(B):

    sample = np.random.choice(data, size=n, replace=True)

    boot_means[b] = np.mean(sample)



# Bootstrap SE

boot_se = np.std(boot_means, ddof=1)

print(f"Bootstrap SE: {boot_se:.3f}")

print(f"Analytical SE: {true_std/np.sqrt(n):.3f}")



# Percentile CI

alpha = 0.05

ci_perc = np.percentile(boot_means, [100*alpha/2, 100*(1-alpha/2)])

print(f"Percentile 95% CI: [{ci_perc[0]:.3f}, {ci_perc[1]:.3f}]")



# BCa CI (simplified)

z0 = np.mean(boot_means < obs_mean)

z_alpha = 1.96

ci_bca_lower = np.percentile(boot_means, 100 * np.mean(boot_means < obs_mean - z_alpha * boot_se))

ci_bca_upper = np.percentile(boot_means, 100 * np.mean(boot_means < obs_mean + z_alpha * boot_se))

print(f"BCa CI (approx): [{ci_bca_lower:.3f}, {ci_bca_upper:.3f}]")



# Bootstrap distribution

plt.figure(figsize=(8, 5))

plt.hist(boot_means, bins=50, edgecolor='black', alpha=0.7)

plt.axvline(x=obs_mean, color='red', linestyle='--', label='Observed')

plt.axvline(x=true_mean, color='green', linestyle='--', label='True')

plt.xlabel('Bootstrap Mean')

plt.ylabel('Frequency')

plt.title('Bootstrap Distribution of the Mean')

plt.legend()

plt.show()



# Bootstrap hypothesis test: H0: mean = 4.5

theta0 = 4.5

p_value = np.mean(np.abs(boot_means - obs_mean) >= np.abs(obs_mean - theta0))

print(f"\nBootstrap test (H0: mean=4.5): p={p_value:.4f}")

Worked Example

Example: Median with Bootstrap CI

Computing a 95% confidence interval for the median of a skewed distribution:

| Method | Estimate | 95% CI |

|--------|----------|--------|

| Normal theory | 4.85 | [4.21, 5.49] |

| Bootstrap percentile | 4.82 | [4.18, 5.62] |

| Bootstrap BCa | 4.82 | [4.25, 5.71] |

The distribution is right-skewed, so the normal-theory CI is asymmetric. The bootstrap methods provide more accurate coverage for the skewed distribution.

Key Takeaways

Summary: Bootstrap Methods

Bootstrap resamples with replacement to approximate the sampling distribution
Works for any statistic — mean, median, regression coefficients, etc.
Use B = 1,000-10,000 bootstrap resamples
Percentile CI is simple but may be biased; BCa adjusts for bias and skewness
Bootstrap provides standard errors and confidence intervals without distributional assumptions
For time series, use block bootstrap to preserve dependence
Subsampling (without replacement) is an alternative for some problems

Bootstrap Methods — Resampling for Inference

Bootstrap Methods — Resampling for Inference

Computer-Intensive Inference Without Distributional Assumptions

DfBootstrap

Bootstrap Principle

Algorithm

Bootstrap Standard Error

Bootstrap Standard Error

Bootstrap Confidence Intervals

Percentile Method

Percentile CI

BCa (Bias-Corrected and Accelerated)

BCa CI

Types of Bootstrap

Bootstrap Hypothesis Testing

Bootstrap p-value

Subsampling

Python Implementation

Worked Example

Example: Median with Bootstrap CI

Key Takeaways

Summary: Bootstrap Methods

Related Topics

Premium Content

Need Expert Statistics Help?