# Stepwise regression Stepwise regression is a technique for [[Variable Selection Methods|variable selection]] that seeks to create a good model based on a (possibly) large set of predictors. It involves adding or subtracting variables to a model depending on whether they significantly affect the fit to the data and some optimality criterion. There are several variants: ## Backward Elimination Backward elimination starts with a regression containing all the predictors, then works backwards to remove them: 1. Start with all variables in the model 2. Calculation conditional contribution of each variable 3. Eliminate the least statistically significant variable that does not meet criteria for remaining in the model (p-value, smallest decrease in [[In-sample Variable Selection Criteria#Akaike Information Criterion (AIC)|AIC]], etc) 4. Repeat steps 2-3 until all the remaining variables meet the minimum requirement for inclusion ## Forward Elimination Forward elimination starts with an intercept-only model and iteratively adds variables to it. 1. Start with a model with only an intercept 2. Calculate the univariate contribution of each variable to some criterion if they are added to the model 3. Add a variable according to some criteria (had smallest p-value, or biggest decrease in AIC) 4. Repeat until all variables meet this requirement ## Stepwise Selection A kind of hybrid approach of the above. 1. Start with an intercept-only model 2. Calculate the contribution that each variable would have if it were added to the model 3. Add the variable with the most optimal contribution (smallest p-value, biggest decrease in AIC) 4. Evaluate entire model to check if any variables do not meet minimum criteria for staying in the model 5. Repeat steps 2-4 until no further variable can be added to the model ## Note on usage Stepwise regression is better for improving predictive power than reducing the effect of confounding. In experimental settings, we should not need to use stepwise regression since [[Confounders|confounders]] are theoretically balanced. In observational settings, you should have a covariate of interest that you want to examine. Stepwise regression should be used to find a set of observed confounders that have significant associations with the outcome, WITHOUT the covariate of interest. After a set of confounders is found, the variable of interest should be added to the model and then we can perform inference on it. Stepwise models do not account for any human knowledge or prior evidence. There should be a balance between criteria-based selection and current scientific knowledge. --- # References [[Applied Linear Regression#10. Variable Selection]]