# Analysis of Variance (ANOVA) Model ## Usual framing Say we have $K$ groups, and we are considering a multiple groups. We'd like to check if one of the group means has a significantly different than the others. The null hypothesis is that all the group means, $\mu_1, ..., \mu_K$, are the same: $ H_0: \mu_1 = \mu_2 = ... = \mu_K $ While the alternative is: $ H_1: \text{One of the means is different} $ Notice that the alternative hypothesis doesn't state *which* group mean is different, just that one of them is different. The complement to all is any. Note that we also have to assume that the *variance of each group* is the same (homoskedasticity). In order to see which group mean is different, multiple pairwise hypothesis tests need to be done to show which one is different. This brings about a [[Multiple Testing Problem|multiplicity problem]]. The test statistic that's used to test this statistic is the F-statistic, which compares the ratio of the *variance of the group means* and the *variance within a group*. $ F = \frac{\text{Var}(\mu_i)}{\sigma^2} $ If the null hypothesis is true, then there is no variance among the group means (i.e. $\text{Var}(\mu_i) = 0$). Therefore, the F-statistic will be 0 or small. Conversely, if the group means are significantly different, then the variance among them will be higher, supporting the alternative hypothesis. The F-test is an instance of the [[Likelihood ratio test|likelihood ratio test]] taking on a well-known distribution. Normal likelihoods lead to the F statistic. ## Linear regression framing With [[Simple Linear Regression|simple linear regression]] and [[Multiple linear regression|multiple linear regression]], remember that the non-intercept coefficients are interpreted in terms of *changes to the mean*. Only the intercept is interpreted as a mean. If we have a simple linear regression with one binary predictor indicating treatment vs placebo, then the two means of the groups are given as: $ \begin{align} &E(Y\mid X = 0 ) = \beta_0 \\ &E(Y\mid X = 1 ) = \beta_0 +\beta_1 \\ \end{align} $ If the two groups means are not significantly different, it also implies that $\beta_1 = 0$. We can also perform ANOVA if we compare these two nested models (one model's coefficients are all contained in another). --- # References [[Applied Linear Regression#6. Testing and Analysis of Variance]]