Peter D. Fields Academia, Research, and whatnot

Caballero QG Chp. 2, Problem set Q2 (R)

In an analysis of a sample of 200 individuals, haplotypes were found for two SNPs with alleles A and a, and B and b, respectively, as shown in the following table:

           
Haplotype AB Ab aB ab Total
Number 29 37 88 46 200

(a) Calculate the linkage disequilibrium (D), its value relative to the maximum possible (D’) and the value of the squared correlation between the allele frequencies of the loci, r2.

We see in equation 2.4 that D can be quantified by looking at gamete frequencies, specifically D = PABPab - PAbPaB.

PAB <- 29/200
Pab <- 46/200
PAb <- 37/200
PaB <- 88/200
D <- (PAB*Pab)-(PAb*PaB)
print(D)
[1] -0.04805

For D’ we need to get estimates of individual allele frequencies in order to calculate Dmax. Specifically we need to estimate pApB and papb, the smallest value of the two being the one we define as Dmax, and then D’ = $\frac{D}{D_{max}}$ (this would be different if the computed D value was greater than 0).

pA <- (29+37)/200
pB <- (29+88)/200
pa <- (88+46)/200
pb <- (37+46)/200
pApB <- pA*pB
print(pApB)
papb <- pa*pb
print(papb)
[1] 0.19305
[1] 0.27805

The smallest value here is pApB! So we can get D’ with:

Dprime <- D/pApB
print(Dprime)
Dprime <- D/pApB
print(Dprime)
[1] -0.2488992

The final calculation we need to make is r2 which can be found from equation 2.7, r2 = $\frac{D^2}{p_Ap_ap_Bp_b}$.

rsquared <- D^2/(pA*pa*pB*pb)
print(rsquared)
[1] 0.04301244

(b) Is the observed disequilibrium significant?

We see on page 22 that we can use a $\chi$2 test to determine if the degree of LD is such that we can say we have significant deviation from expected allele associations. We can calculate this by looking at the observed frequencies of the different haplotypes in a similar manner to testing for HWE in question 2.1. Additionally, Caballero suggests we can use r2 multiplied by the sample size to derive the value to compare to our $\chi$2 expectation and a degrees of freedom = 1 for biallelic loci.

tstat <- rsquared*200
print(tstat)
pchisq(tstat,df=1,lower.tail=FALSE)
[1] 8.602488
0.00335704080614491

The observed p-value suggests our observed haplotypes are deviating from equilibrium!

(c) If the recombination frequency between the two SNPs is 0.05, how many generations would it take for the disequilibrium to be reduced by half?

We see on page 21, and specifically equation 2.5, that LD decay can be calculated from some starting point to some future point using Dt = D0(1-c)t. As outlined in the last paragraph on page 21, solving for t and also in the specific case of disequilibrium decay of half, t $\approx$ [ln(2)]/c, where c is recombination frequency.

t <- log(2)/0.05
print(t)
[1] 13.86294

So it would take approximately 14 generations to reduce LD by half!