🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Bootstrap Methods — Resampling for Inference

StatisticsResampling🟢 Free Lesson

Advertisement

Bootstrap Methods — Resampling for Inference

Statistics

Computer-Intensive Inference Without Distributional Assumptions

Bootstrapping estimates the sampling distribution of any statistic by resampling with replacement from the data. It provides standard errors, confidence intervals, and hypothesis tests when theoretical formulas are unavailable or unreliable.

  • Finance — Estimate VaR confidence intervals for complex portfolio distributions

  • Ecology — Build confidence intervals for species diversity indices

  • Machine Learning — Assess variability of feature importance measures

Let the data generate its own reference distribution through the power of resampling.


Bootstrapping is a resampling method that estimates the sampling distribution of a statistic by sampling with replacement from the observed data. It provides standard errors and confidence intervals without distributional assumptions.

DfBootstrap

A computer-intensive method that approximates the sampling distribution of a statistic by repeatedly resampling (with replacement) from the observed data and recomputing the statistic for each resample.


Bootstrap Principle

Key Insight

The empirical distribution of the sample approximates the true population distribution. Therefore, the distribution of statistics computed from bootstrap samples approximates the true sampling distribution.


Algorithm

| Step | Action |

|------|--------|

| 1 | Draw a bootstrap sample X1,,XnX^*_1, \ldots, X^*_n by sampling with replacement from the original data |

| 2 | Compute the statistic θ^\hat{\theta}^* from the bootstrap sample |

| 3 | Repeat steps 1-2 B times (typically B = 1,000 - 10,000) |

| 4 | Use the distribution of θ^1,,θ^B\hat{\theta}^*_1, \ldots, \hat{\theta}^*_B for inference |


Bootstrap Standard Error

Bootstrap Standard Error

SEboot=1B1b=1B(θ^bθˉ)2SE_{boot} = \sqrt{\frac{1}{B-1}\sum_{b=1}^{B}(\hat{\theta}^*_b - \bar{\theta}^*)^2}

Here,

  • θ^b\hat{\theta}^*_b=Statistic from bootstrap sample b
  • θˉ\bar{\theta}^*=Mean of bootstrap statistics
  • BB=Number of bootstrap resamples

Bootstrap Confidence Intervals

Percentile Method

Percentile CI

[θ^α/2,θ^1α/2][\hat{\theta}^*_{\alpha/2}, \hat{\theta}^*_{1-\alpha/2}]

Here,

  • θ^p\hat{\theta}^*_{p}=p-th percentile of bootstrap distribution

BCa (Bias-Corrected and Accelerated)

BCa CI

[θ^α1,θ^α2][\hat{\theta}^*_{\alpha_1}, \hat{\theta}^*_{\alpha_2}]

Here,

  • α1=Φ(z^0+z^0+zα1a^(z^0+zα))\alpha_1 = \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + z_{\alpha}}{1 - \hat{a}(\hat{z}_0 + z_{\alpha})}\right)=Adjusted lower percentile
  • z^0\hat{z}_0=Bias correction factor
  • a^\hat{a}=Acceleration factor

BCa vs Percentile

The BCa interval adjusts for bias and skewness in the bootstrap distribution. It is generally preferred over the simple percentile method.


Types of Bootstrap

| Type | Resampling Unit | When to Use |

|------|----------------|-------------|

| Nonparametric | Individual observations | Default; no distributional assumptions |

| Parametric | From fitted distribution | When distribution is known |

| Block | Blocks of observations | Time series data |

| Wild | Residuals with sign changes | Heteroscedastic data |


Bootstrap Hypothesis Testing

Bootstrap p-value

p=#{θ^bθ^obsθ^obsθ0}+1B+1p = \frac{\#\{|\hat{\theta}^*_b - \hat{\theta}_{obs}| \geq |\hat{\theta}_{obs} - \theta_0|\} + 1}{B + 1}

Here,

  • θ^obs\hat{\theta}_{obs}=Observed statistic
  • θ0\theta_0=Null hypothesis value
  • BB=Number of bootstrap samples

Subsampling

A related method that samples without replacement with a smaller sample size m<nm < n.

Subsampling vs Bootstrap

Subsampling does not require the data to be exchangeable and works for some problems where the bootstrap fails (e.g., unit roots). However, it requires choosing mm.


Python Implementation


import numpy as np

import matplotlib.pyplot as plt



np.random.seed(42)



# Original data

n = 200

true_mean = 5.0

true_std = 2.0

data = np.random.normal(true_mean, true_std, n)



# Observed statistic

obs_mean = np.mean(data)

print(f"Observed mean: {obs_mean:.3f}")



# Bootstrap

B = 5000

boot_means = np.zeros(B)

for b in range(B):

    sample = np.random.choice(data, size=n, replace=True)

    boot_means[b] = np.mean(sample)



# Bootstrap SE

boot_se = np.std(boot_means, ddof=1)

print(f"Bootstrap SE: {boot_se:.3f}")

print(f"Analytical SE: {true_std/np.sqrt(n):.3f}")



# Percentile CI

alpha = 0.05

ci_perc = np.percentile(boot_means, [100*alpha/2, 100*(1-alpha/2)])

print(f"Percentile 95% CI: [{ci_perc[0]:.3f}, {ci_perc[1]:.3f}]")



# BCa CI (simplified)

z0 = np.mean(boot_means < obs_mean)

z_alpha = 1.96

ci_bca_lower = np.percentile(boot_means, 100 * np.mean(boot_means < obs_mean - z_alpha * boot_se))

ci_bca_upper = np.percentile(boot_means, 100 * np.mean(boot_means < obs_mean + z_alpha * boot_se))

print(f"BCa CI (approx): [{ci_bca_lower:.3f}, {ci_bca_upper:.3f}]")



# Bootstrap distribution

plt.figure(figsize=(8, 5))

plt.hist(boot_means, bins=50, edgecolor='black', alpha=0.7)

plt.axvline(x=obs_mean, color='red', linestyle='--', label='Observed')

plt.axvline(x=true_mean, color='green', linestyle='--', label='True')

plt.xlabel('Bootstrap Mean')

plt.ylabel('Frequency')

plt.title('Bootstrap Distribution of the Mean')

plt.legend()

plt.show()



# Bootstrap hypothesis test: H0: mean = 4.5

theta0 = 4.5

p_value = np.mean(np.abs(boot_means - obs_mean) >= np.abs(obs_mean - theta0))

print(f"\nBootstrap test (H0: mean=4.5): p={p_value:.4f}")

Worked Example

Example: Median with Bootstrap CI

Computing a 95% confidence interval for the median of a skewed distribution:

| Method | Estimate | 95% CI |

|--------|----------|--------|

| Normal theory | 4.85 | [4.21, 5.49] |

| Bootstrap percentile | 4.82 | [4.18, 5.62] |

| Bootstrap BCa | 4.82 | [4.25, 5.71] |

The distribution is right-skewed, so the normal-theory CI is asymmetric. The bootstrap methods provide more accurate coverage for the skewed distribution.


Key Takeaways

Summary: Bootstrap Methods

  • Bootstrap resamples with replacement to approximate the sampling distribution

  • Works for any statistic — mean, median, regression coefficients, etc.

  • Use B = 1,000-10,000 bootstrap resamples

  • Percentile CI is simple but may be biased; BCa adjusts for bias and skewness

  • Bootstrap provides standard errors and confidence intervals without distributional assumptions

  • For time series, use block bootstrap to preserve dependence

  • Subsampling (without replacement) is an alternative for some problems


Related Topics

Premium Content

Bootstrap Methods — Resampling for Inference

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement