# Bootstrapping
The bootstrap is a general, nonparametric procedure for estimating the variance and distribution of a statistic. The procedure itself is very simple to explain, but the underlying mathematics are pretty heavy. Very Normal did a [video](https://www.youtube.com/watch?v=BiNcdYbyiWw) on this for the third Summer of Math Exposition.
## Notation
Let $\mathbf{x} = (x_1, ..., x_n)$ be some observed dataset from some true (unknown) probability distribution $\mathcal{P}$ that is parameterized by $\theta$.
We are interested in estimating $\theta$ from the data via some estimator $T(\mathbf{x}) = \hat{\theta}$. This estimator can also be seen as a functional of $\mathcal{P}$.
But we are unsure about the variance of $\hat{\theta}$ (due to violated assumptions, unknown analytic form, etc.).
We can estimate the *distribution* (and by extension, functions of the distribution like variance) of $\hat{\theta}$ with bootstrapping
## The Procedure
1. Resample the dataset $\mathbf{x}$ *with replacement* to create a bootstrap dataset $\mathbf{x}_b^*$
2. Based on the bootstrap dataset, calculate the relevant statistic $T(\mathbf{x}^*_b) = \hat{\theta}^*_b$
3. Repeat steps 1-2 for $b=1,..., B$ for some large $B$
4. Assess the statistical properties of the estimator via the collection of bootstrap statistics (i.e. standard error, quantiles, p-values, confidence intervals)
## Remarks
- Some sources mistakenly state that the bootstrap is a
- Hesterberg once noted that the bootstrap confidence intervals tend to be too short and have undercoverage
- When model assumptions are violated (or suspected to be), then bootstrapping can produce more robust variance estimates
- Depending on the problem, what needs to be resampled might be different. For instance, in longitudinal data, it makes more sense to resample individuals *and all their data*, rather than just singular observations.
---
# References
- [[Applied Linear Regression#7. Variances]]
- Hesterberg, Tim C. “What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum.” The American Statistician. Informa UK Limited, October 2, 2015. https://doi.org/10.1080/00031305.2015.1089789.