# Bootstrapping The bootstrap is a general, nonparametric procedure for estimating the variance and distribution of a statistic. The procedure itself is very simple to explain, but the underlying mathematics are pretty heavy. Very Normal did a [video](https://www.youtube.com/watch?v=BiNcdYbyiWw) on this for the third Summer of Math Exposition. ## Notation Let $\mathbf{x} = (x_1, ..., x_n)$ be some observed dataset from some true (unknown) probability distribution $\mathcal{P}$ that is parameterized by $\theta$. We are interested in estimating $\theta$ from the data via some estimator $T(\mathbf{x}) = \hat{\theta}$. This estimator can also be seen as a functional of $\mathcal{P}$. But we are unsure about the variance of $\hat{\theta}$ (due to violated assumptions, unknown analytic form, etc.). We can estimate the *distribution* (and by extension, functions of the distribution like variance) of $\hat{\theta}$ with bootstrapping ## The Procedure 1. Resample the dataset $\mathbf{x}$ *with replacement* to create a bootstrap dataset $\mathbf{x}_b^*$ 2. Based on the bootstrap dataset, calculate the relevant statistic $T(\mathbf{x}^*_b) = \hat{\theta}^*_b$ 3. Repeat steps 1-2 for $b=1,..., B$ for some large $B$ 4. Assess the statistical properties of the estimator via the collection of bootstrap statistics (i.e. standard error, quantiles, p-values, confidence intervals) ## Remarks - Some sources mistakenly state that the bootstrap is a - Hesterberg once noted that the bootstrap confidence intervals tend to be too short and have undercoverage - When model assumptions are violated (or suspected to be), then bootstrapping can produce more robust variance estimates - Depending on the problem, what needs to be resampled might be different. For instance, in longitudinal data, it makes more sense to resample individuals *and all their data*, rather than just singular observations. --- # References - [[Applied Linear Regression#7. Variances]] - Hesterberg, Tim C. “What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum.” The American Statistician. Informa UK Limited, October 2, 2015. https://doi.org/10.1080/00031305.2015.1089789.