# Simple Linear Regression
A simple linear regression is a statistical model that approximates the relationship between two variables $X$ and $Y$ through a linear relationship:
$
Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i
$
The variables $X$, $Y$ and $\varepsilon$ can be called several names;
- $Y$ can be referred to as the *outcome* or dependent variable
- $X$ can be called the *covariate*, *predictor* or *independent variable*
- $\varepsilon$ are the errors that represent the gap between the observed outcome and the model value
The model structure itself is a form of an assumption (a modeling assumption). It should be viewed as an approximation rather than true reality.
The model parameters, $(\beta_0, \beta_1, \sigma^2)$ can be estimated in several ways:
- Most famously, there are the [[Ordinary least square (OLS) estimators|ordinary least square estimators]].
- [[Maximum likelihood estimators for regression|Maximum likelihood estimators]] may be used, but the MLE for variance is biased.
## Assumptions
The assumptions of linear regression lie in the errors, not the covariate and outcome. Typically, they are assumed to have zero mean, have constant mean and form a simple random sample. More info on these assumptions and their consequences can be found in the [[Statistical properties of the OLS estimators|statistical properties of the OLS estimators]].
In short, they are the following:
$
\begin{array}
EE(\varepsilon \mid X = x) = 0 \\
\text{Var}(\varepsilon \mid X = x) = \sigma^2
\end{array}
$
## Interpreting the parameters
To interpret the model coefficients, we need to get them alone in the model.
### Intercept
For the intercept, we need to take the expectation of the model and substitute a value of 0 for the predictor.
$
E(Y_i \mid X = 0) = \beta_0
$
Therefore, we interpret the intercept as *the average value of the outcome when the covariate equals 0*. In more concrete terms, we often refer to to $X = 0$ as the *baseline* since the other parameter is interpreted with respect to it.
In the case of a binary predictor:
>[!example]
>If $X = 1$ means someone is on treatment A and $X = 0$ means that someone is on placebo, then $\beta_0$ represents the average outcome of someone in the placebo group.
In the case of a continuous predictor:
>[!example]
>If $X$ represents number of hours spent exercising, then $\beta_0$ represents the average outcome of someone who has zero hours of exercise.
### Non-intercept
For the non-intercept, we need to deal with two equations: one when the predictor equals 0 and another when the predictor equals 1.
$
\begin{align}
&E(Y \mid X = 0) = \beta_0 \\
&E(Y \mid X = 1) = \beta_0 + \beta_1 \\
\end{align}
$
If we take the difference of these two equations, we can isolate $\beta_1$:
$
E(Y \mid X = 1) - E(Y \mid X = 0) = \beta_1
$
Therefore, we interpret the non-intercept as *the average change in the outcome* for a unit increase (+1) in the covariate. Depending on the type of the covariate, this "unit increase" can have different interpretations.
In the case of a binary predictor:
>[!example]
>If $X = 1$ means someone is on treatment A and $X = 0$ means that someone is on placebo, then $\beta_1$ represents the average change in the outcome associated with being on treatment A, relative to the placebo group.
In the case of a continuous predictor:
>[!example]
>If $X$ represents number of hours spent exercising, then $\beta_0$ represents the average change in the outcome associated with an extra hours of exercise.
### Interpreting under log transforms
When the outcome is on the logarithmic scale, the non-intercept coefficients become *percent changes* in the outcome for a unit increase in the predictor
$
\frac{E(Y \mid X_j = x + 1, \mathbf{X})}{E(Y \mid X_j = x, \mathbf{X})} \approx \frac{C \cdot \exp{\beta_j (x + 1)}}{C \cdot \exp{\beta_j x}} = \exp(\beta_j)
$
When *both* the outcome and predictor are on the logarithmic scale, the non-intercept coefficients become *power change* in the outcome for a $k$ factor increase in the predictor
$
\frac{E(Y \mid X_j = cx, \mathbf{X})}{E(Y \mid X_j = x, \mathbf{X})}
\approx \frac{\exp{\beta_j \log(kx)}}{\exp{\beta_j \log(x)}} = k^{\beta_j}
$
## Code implementation
- [[Linear regression in R]]
## Potential Problems
- [[High multicollinearity leads to high variance for the OLS estimates]]
- [[Unobserved covariates can drastically change estimated coefficients]]
---
# References
[[Applied Linear Regression#2. Simple Linear Regression]]