Holdout methods - Very Normal

# Holdout methods Holdout methods are used to assess the predictive ability of an estimated model. Once a model is estimated from a training set, its predictive ability can be tested on a dataset that is not used in training (hold out). Estimates of accuracy will depend greatly on which subset of the data was used as the test set. ## Cross-validation K-fold cross validation is a method for accounting for the specific biases that come with specific test sets. Rather than use just one test set, the idea of cross-validation is to use different partitions of the data for training and testing. With $K$ partitions ("folds"), we will get $K$ estimates of the out-of-sample estimates for prediction error. 1. Split the data up into $K$ roughly equal-sized pieces 2. For $k = 1,... K$: - Use the other $k-1$ pieces to train the data - Evaluate out-of-sample error using the $k$th sample 3. Characterize the different test prediction estimates Then, we can characterize the typical test prediction as well as its variance. It is always best to use out-of-sample criteria to assess a model's predictive performance. These work best when the sample size is large, but if small, we can use [[In-sample Variable Selection Criteria|in-sample criteria]] instead. --- # References [[Applied Linear Regression#10. Variable Selection]]