In lieu of an abstract, here is a brief excerpt of the content:

  • Update to Long and Kittles's "Human Genetic Diversity and the Nonexistence of Biological Races" (2003):Fixation on an Index
  • Jeffrey C. Long
Keywords

FST, Short Tandem Repeats (STRs), Single Nucleotide Polymorphisms (SNPs), Chimpanzees, Hierarchical Models, Nested Models, Genetic Diversity, Racial Typing

Sewall Wright's fixation index FST measured among samples of world populations is often 0.15 or less when computed as an average over many alleles or loci. To many, this result indicates that the genetic similarities among human populations far outweigh the differences. For example, a finding like this led Richard Lewontin to claim that human races have no genetic or taxonomic significance (Lewontin 1972). Despite the far-reaching proclamations that researchers make from FST, few have questioned the validity of how it is applied or interpreted.

Earlier in this decade, Rick Kittles and I took an unusually critical look at FST (Long and Kittles 2003). We analyzed a unique data set composed of short tandem repeat (STR) allele frequencies for eight loci genotyped in both humans and chimpanzees (Deka et al. 1995). These data made it possible to see how FST played out when no one could dispute taxonomic and genetic significance. The answer surprised us. FST was pretty close to the canonical 0.15 shown so many times for human populations. In our analysis, FST was 0.12 for humans, but for humans and chimpanzees together, FST rose only to 0.18. Indeed, we found one locus, D13S122, where the size range of human and chimpanzee alleles hardly overlapped, yet FST equaled 0.15 (Figure 1). We ultimately found that the genetic and statistical model underlying FST does not fit well to human populations. Specifically, human population structure strongly biases the outcome of analyses by violating two assumptions: first, that expected genetic diversity is the same in every population; and second, that divergence between all pairs of populations is equal and independent. These assumptions are explicit and clear in the major statistical papers on estimating FST (Cockerham 1969; Weir and Cockerham 1984; Weir and Hill 2002), but most researchers ignore them. More important, Kittles and I introduced a way to relax these assumptions by using generalized hierarchical models that nest smaller units, such as genes, into larger units, such as individuals, populations, and geographic regions. In our approach, it is possible to restate many hierarchical models as either expansions or reductions of each other, and by comparing a sequence of nested models, we are able to identify those [End Page 799] human demographic features that have the greatest effect on the distribution of genetic variation.


Click for larger view
View full resolution
Figure 1.

Dinucleotide repeat allele size distributions for the D13S122 locus in chimpanzees and humans. The frequencies for humans are unweighted averages from eight groups representing diverse worldwide localities (Deka et al. 1995). Notice the clear distinction between humans and chimpanzees, but FST = 0.15.

Continuing Lapses in Using FST Critically

A recent review notes that different kinds of genetic markers give different estimates of FST (Holsinger and Weir 2009). For example, FST estimated from STRs is 0.05, but FST estimated from single nucleotide polymorphisms (SNPs) is 0.09 (Li et al. 2008; Rosenberg et al. 2002). This discrepancy should be no surprise because FST depends on allele frequencies; it is inversely proportional to the variation within populations, and STRs are more variable than SNPs. Kittles and I demonstrated this with algebra, but the finding was not novel. Wright had pointed to it in his major synthesis (Wright 1978), and Phillip Hedrick had shown the same result for a slightly different statistic (Hedrick 1999). In 2009, my colleagues and I showed that patterns of variation in STRs and segregating sites in DNA sequences are concordant after standardizing variance components and fixation indexes (such as FST) relative to their theoretical maxima (Long et al. 2009).

Holsinger and Weir (2009) say correctly that the differences between data types make it hard to estimate population genetic parameters such as effective migration rates (Nem) from FST (Holsinger and Weir 2009). However, they fail to warn readers about...

pdf