Peter D. Fields Academia, Research, and whatnot

Caballero QG Chp. 2, Problem set Q1.b (R)

(b) Is the population in Hardy-Weinberg equilibrium for that locus?

In order to test if the population is in Hardy-Weinberg equilibrium (HWE) we need to determine the observed vs. expected distribution of the different genotypes given the observed allele frequencies. Above we estimated the frequences of A1,A2,A3. Let’s recall our allele frequencies for A1 = p1, A2 = p2, and A3 = p3.

print(p1)
[1] 0.3375
print(p2)
[1] 0.45
print(p3)
[1] 0.2125

Now, let’s work out the expected frequencies of each of expected genotype:

A1A1 = g1

g1 <- p1*p1*200
print(g1)
[1] 22.78125

A2A2 = g2

g2 <- p2*p2*200
print(g2)
[1] 40.5

A3A3 = g3

g3 <- p3*p3*200
3
print(g3)
[1] 9.03125

A1A2 = g4

g4 <- 2*p1*p2*200
print(g4)
[1] 60.75

A1A3 = g5

g5 <- 2*p1*p3*200
print(g5)
[1] 28.6875

A2A3 = g6

g6 <- 2*p2*p3*200
print(g6)
[1] 38.25

Now let’s summarize the observed vs. expectated genotype counts:

Genotype A1A1 A2A2 A3A3 A1A2 A1A3 A2A3 Total
Observed 23 61 28 39 41 8 200
Expected 22.78 60.75 28.69 40.5 38.25 9.03 -

The observed vs. expected look pretty close! However, we need to use a significance test to decide how consistent these observations are with the expected values. We will use a $\chi$2-test, with the basic form of:

$\chi$2 = $\sum$ $\frac{(O_i-E_i)^2}{E_i}$

Let’s create a variable for each of the expected genotype classes; we’ll create a e equivalent for all g values above:

e1 <- 23; e2 <- 39; e3 <- 8; e4 <- 61; e5 <- 28; e6 <- 41;

Now let’s generate our observed $\chi$2 value, c

c <- ((g1-e1)^2/e1) + ((g2-e2)^2/e2) + ((g3-e3)^2/e3) + ((g4-e4)^2/e4) + ((g5-e5)^2/e5) + ((g6-e6)^2/e6)
print(c)
[1] 0.3950638

Our degrees of freedom for a HWE test is based upon the number of genotype classes (6) minus the number of parameters needed to quantify our expected values. We need 2 allele classes (the third is estimable from the first two) and the popoulation size (N). So we end up with 6-2-1 = 3df. There is a standard $\chi$2-test in R but it doesn’t allow us to specify the degrees of freedom (df) (it’s really setup for contingency tables). As found here, we can use the function pchisq to make our test:

pchisq(c,df=3,lower.tail=FALSE)
0.9412603061593

Given our value is much greater than 0.05, we can accept our null hypothesis that the population is probably in HWE!