# Unobserved covariates can drastically change estimated coefficients Given a linear regression model, unobserved covariates are covariates that truly have a relationship with both the predictor and the outcome. Their presence can greatly change the estimated value of the coefficients or even change their sign, altering their meaning completely. Let's say that some outcome $Y$ has a true, linear relationship with two predictors $X_1$ and $X_2$: $ E(Y \mid X_1, X_2) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 $ But now let's say that $X_2$ is a covariate that we can't actually observe. Using the [[Law of total expectation|law of total expectation]], we can see how the coefficients will change: $ \begin{align} E\left[ E(Y \mid X_1, X_2) \mid X_1 \right] &= E\left[ \beta_0 + \beta_1 X_1 + \beta_2 X_2 \mid X_1 \right] \\ &= \beta_0 + \beta_1 X_1 + \beta_2 E(X_2 \mid X_1) \end{align} $ But if $X_2$ actually has a linear relationship with $X_1$: $ E(X_2 \mid X_1) = \gamma_0 + \gamma_1 X_1 $ Then this has a downstream effect on the observed relationship between $Y$ and $X_1$: $ \begin{align} E\left[ E(Y \mid X_1, X_2) \mid X_1 \right] &= \beta_0 + \beta_1 X_1 + \beta_2 E(X_2 \mid X_1) \\ &= (\beta_0 + \gamma_0) + (\beta_1 + \beta_2\gamma_1)X_1 \end{align} $ Depending on the specific values of $\gamma_0$ and $\gamma_1$, the estimated value for $\beta_0$ and $\beta_1$ may change greatly in value or change sign. Furthermore, the presence of the relationship between $X_1$ and $X_2$ also implies that variance estimates will grow due to [[High multicollinearity leads to high variance for the OLS estimates|multicollinearity]]. --- # References [[Applied Linear Regression#4. Interpretation of Main Effects]]