Caballero QG Chp. 2, Problem set Q2 (R)

In an analysis of a sample of 200 individuals, haplotypes were found for two SNPs with alleles A and a, and B and b, respectively, as shown in the following table:


Haplotype	AB	Ab	aB	ab	Total
Number	29	37	88	46	200

(a) Calculate the linkage disequilibrium (D), its value relative to the maximum possible (D’) and the value of the squared correlation between the allele frequencies of the loci, r².

We see in equation 2.4 that D can be quantified by looking at gamete frequencies, specifically D = P_ABP_ab - P_AbP_aB.

PAB <- 29/200
Pab <- 46/200
PAb <- 37/200
PaB <- 88/200
D <- (PAB*Pab)-(PAb*PaB)
print(D)
[1] -0.04805

For D’ we need to get estimates of individual allele frequencies in order to calculate D_max. Specifically we need to estimate p_Ap_B and p_ap_b, the smallest value of the two being the one we define as D_max, and then D’ = $\frac{D}{D_{max}}$ (this would be different if the computed D value was greater than 0).

pA <- (29+37)/200
pB <- (29+88)/200
pa <- (88+46)/200
pb <- (37+46)/200
pApB <- pA*pB
print(pApB)
papb <- pa*pb
print(papb)
[1] 0.19305
[1] 0.27805

The smallest value here is p_Ap_B! So we can get D’ with:

Dprime <- D/pApB
print(Dprime)
Dprime <- D/pApB
print(Dprime)
[1] -0.2488992

The final calculation we need to make is r² which can be found from equation 2.7, r² = $\frac{D^2}{p_Ap_ap_Bp_b}$.

rsquared <- D^2/(pA*pa*pB*pb)
print(rsquared)
[1] 0.04301244

(b) Is the observed disequilibrium significant?

We see on page 22 that we can use a $\chi$² test to determine if the degree of LD is such that we can say we have significant deviation from expected allele associations. We can calculate this by looking at the observed frequencies of the different haplotypes in a similar manner to testing for HWE in question 2.1. Additionally, Caballero suggests we can use r² multiplied by the sample size to derive the value to compare to our $\chi$² expectation and a degrees of freedom = 1 for biallelic loci.

tstat <- rsquared*200
print(tstat)
pchisq(tstat,df=1,lower.tail=FALSE)
[1] 8.602488
0.00335704080614491

The observed p-value suggests our observed haplotypes are deviating from equilibrium!

(c) If the recombination frequency between the two SNPs is 0.05, how many generations would it take for the disequilibrium to be reduced by half?

We see on page 21, and specifically equation 2.5, that LD decay can be calculated from some starting point to some future point using D_t = D₀(1-c)^t. As outlined in the last paragraph on page 21, solving for t and also in the specific case of disequilibrium decay of half, t $\approx$ [ln(2)]/c, where c is recombination frequency.

t <- log(2)/0.05
print(t)
[1] 13.86294

So it would take approximately 14 generations to reduce LD by half!

Written on April 16th, 2021 by P. Fields

Feel free to share!

Caballero QG Chp. 2, Problem set Q2 (R)

You may also enjoy: