# McNemar's Test
The typical hypothesis test for two-sample binary data is the [[Chi-squared test of independence]]. There may be instances where the data may be "paired". In the categorical case, one person/observation produces two observations.
In this case, McNemar's test is a better alternative to use.
## Example
A medical device company is testing a new diagnostic test for detecting breast cancer. The gold standard for identifying breast cancer is a biopsy. The company gathers a group of woman and has them do two things:
1. Get a biopsy and
2. use the new diagnostic test
Each woman will get a result from each method, so the data produced here is paired. The corresponding contingency table is:
| | Dx Test Cancer | Dx Test No Cancer | Row Total |
| ---------------- | -------------- | ----------------- | --------- |
| Biopsy Cancer | 9 | 2 | 11 |
| Biopsy No Cancer | 3 | 7 | 10 |
| Column Total | 12 | 9 | 20 |
Notice that the rows and columns are the two methods. Instead of having an independent-dependent variable structure, the top-left cell and bottom-right cell represent where the two methods agree. The other two cells represent where they disagree on the results.
## Chi-squared version
McNemar's test is actually a test on the *marginal probabilities*. In the example, we want to know if the probability of detecting cancer using biopsy ($\pi_{1.}$) is the same as the probability of detecting cancer using the new test ($\pi_{.1}$). This gives a null hypothesis:
$
H_0: \pi_{1.} = \pi_{.1}
$
From the perspective of the contingency table, the marginal count of the first column should be similar to the marginal count of the first row. These values represent the ability of each test to detect cancer, so that's what we want to compare.
The test statistic for McNemar's test is a function of the *off-diagonal* counts:
$
T = \frac{(n_{12} - n_{21})^2}{(n_{12} + n_{21})}
$
Under the null, this statistic has a chi-squared distribution with 1 degree of freedom.
## Aside: Wald-like test
We can reframe McNemar's test as a Wald-like test.
$
H_0: \pi_{1.} = \pi_{.1} \implies H_0: \pi_{1.} - \pi_{.1} = 0
$
If the sample size is large, then we can approximate the test using a Normal distribution. The estimate would just be the estimated difference in marginal proportions: $\hat{\pi}_{1.} - \hat{\pi}_{.1}$
If we estimate this with the difference in estimated proportions, then we can get an expression for the variance:
$
\text{Var}(\hat{\pi_{1.}} - \hat{\pi}_{.1}) = \frac{\hat{\pi}_{1.} + \hat{\pi}_{.1}}{n}
$
So the resulting z-statistic would be:
$
z = \frac{n_{21} - n_{12}}{\sqrt{n_{21} + n_{12}}}
$
---
# References
[[Categorical Data Analysis#Chapter 2 - Describing Contingency Tables]]