In order to test if the population is in Hardy-Weinberg equilibrium (HWE) we need to determine the observed vs. expected distribution of the different genotypes given the observed allele frequencies. Above we estimated the frequences of A1,A2,A3. Let’s recall our allele frequencies for A1 = p1, A2 = p2, and A3 = p3.
print(p1)
[1] 0.3375
print(p2)
[1] 0.45
print(p3)
[1] 0.2125
Now, let’s work out the expected frequencies of each of expected genotype:
A1A1 = g1
g1 <- p1*p1*200
print(g1)
[1] 22.78125
A2A2 = g2
g2 <- p2*p2*200
print(g2)
[1] 40.5
A3A3 = g3
g3 <- p3*p3*200
3
print(g3)
[1] 9.03125
A1A2 = g4
g4 <- 2*p1*p2*200
print(g4)
[1] 60.75
A1A3 = g5
g5 <- 2*p1*p3*200
print(g5)
[1] 28.6875
A2A3 = g6
g6 <- 2*p2*p3*200
print(g6)
[1] 38.25
Now let’s summarize the observed vs. expectated genotype counts:
Genotype | A1A1 | A2A2 | A3A3 | A1A2 | A1A3 | A2A3 | Total |
---|---|---|---|---|---|---|---|
Observed | 23 | 61 | 28 | 39 | 41 | 8 | 200 |
Expected | 22.78 | 60.75 | 28.69 | 40.5 | 38.25 | 9.03 | - |
The observed vs. expected look pretty close! However, we need to use a significance test to decide how consistent these observations are with the expected values. We will use a $\chi$2-test, with the basic form of:
$\chi$2 = $\sum$ $\frac{(O_i-E_i)^2}{E_i}$
Let’s create a variable for each of the expected genotype classes; we’ll create a e equivalent for all g values above:
e1 <- 23; e2 <- 39; e3 <- 8; e4 <- 61; e5 <- 28; e6 <- 41;
Now let’s generate our observed $\chi$2 value, c
c <- ((g1-e1)^2/e1) + ((g2-e2)^2/e2) + ((g3-e3)^2/e3) + ((g4-e4)^2/e4) + ((g5-e5)^2/e5) + ((g6-e6)^2/e6)
print(c)
[1] 0.3950638
Our degrees of freedom for a HWE test is based upon the number of genotype classes (6) minus the number of parameters needed to quantify our expected values. We need 2 allele classes (the third is estimable from the first two) and the popoulation size (N). So we end up with 6-2-1 = 3df. There is a standard $\chi$2-test in R but it doesn’t allow us to specify the degrees of freedom (df) (it’s really setup for contingency tables). As found here, we can use the function pchisq to make our test:
pchisq(c,df=3,lower.tail=FALSE)
0.9412603061593
Given our value is much greater than 0.05, we can accept our null hypothesis that the population is probably in HWE!
Written on April 2nd , 2021 by P. Fields