# Holdout methods
Holdout methods are used to assess the predictive ability of an estimated model. Once a model is estimated from a training set, its predictive ability can be tested on a dataset that is not used in training (hold out).
Estimates of accuracy will depend greatly on which subset of the data was used as the test set.
## Cross-validation
K-fold cross validation is a method for accounting for the specific biases that come with specific test sets. Rather than use just one test set, the idea of cross-validation is to use different partitions of the data for training and testing. With $K$ partitions ("folds"), we will get $K$ estimates of the out-of-sample estimates for prediction error.
1. Split the data up into $K$ roughly equal-sized pieces
2. For $k = 1,... K$:
- Use the other $k-1$ pieces to train the data
- Evaluate out-of-sample error using the $k$th sample
3. Characterize the different test prediction estimates
Then, we can characterize the typical test prediction as well as its variance.
It is always best to use out-of-sample criteria to assess a model's predictive performance. These work best when the sample size is large, but if small, we can use [[In-sample Variable Selection Criteria|in-sample criteria]] instead.
---
# References
[[Applied Linear Regression#10. Variable Selection]]