Effect of Sociocultural Cleavage on Genetic Differentiation: A Study from North India
Indian populations possess an exclusive genetic profile primarily due to the many migratory events, which caused an extensive range of genetic diversity, and also due to stringent and austere sociocultural barriers that structure these populations into different endogamous groups. In the present study we attempt to explore the genetic relationships between various endogamous North Indian populations and to determine the effect of stringent social regulations on their gene pool. Twenty STR markers were genotyped in 1,800 random North Indians from 9 endogamous populations belonging to upper-caste and middle-caste Hindus and Muslims. All nine populations had high allelic diversity (176 alleles) and average observed heterozygosity (0.742 ± 0.06), suggesting strong intrapopulation diversity. The average FST value over all loci was as low as 0.0084. However, within-group FST and genetic distance analysis showed that populations of the same group were genetically closer to each other. The genetic distance of Muslims from middle castes (FST = 0.0090; DA = 0.0266) was significantly higher than that of Muslims from upper castes (FST = 0.0050; DA = 0.0148). Phylogenetic trees (neighbor-joining and maximum-likelihood) show the basal cluster pattern of three clusters corresponding to Muslims, upper-caste, and middle-caste populations, with Muslims clustered with upper-caste populations. Based on the results, we conclude that the extensive gene flow through a series of migrations and invasions has created an enormous amount of genetic diversity. The interpopulation differences are minimal but have a definite pattern, in which populations of different socioreligious groups have more genetic similarity within the same group and are genetically more distant from populations of other groups. Finally, North Indian Muslims show a differential genetic relationship with upper- and middle-caste populations.
Short Tandem Repeats (Strs), North India, Castes, Muslims, Hindus, Phylogenetic Reconstruction, Neighbor-Joining Tree, Surname Endogamy, Fga, D5s818, D7s820, D11s2010, D13s767, D9s926, D2s1328, D18s848, D14s306, D3s1358, Acpp, Tpo, Tho1, Vwa, Fes, F13a1, D16s310, Dhfrp2, Hprt, D4s243
One endeavor of the analysis of human genetic variation is to determine the amount, pattern, distribution, and structuring of genetic diversity across different geo-ethnic, [End Page 271] sociocultural, and linguistic human groups (Cavalli-Sforza 2005; Bamshad et al. 2003). In this context, the genetic structure, affinity, and diversity of Indian populations is often contested and postulated to hold an important key to the range of factors that shaped the contemporary pattern of genetic variation.
The Indian subcontinent is an assemblage of more than 1 billion individuals, constituting about one-fifth of the total world population. A long and in-depth period of human survival, colossal gene flow from different corners of the world, and unique sociocultural practices have created immense ethnic, morphological, linguistic, and religious diversity among the contemporary human groups of India. Two Neolithic episodes of migration of the speakers of Elamo-Dravidian (approximately 10,000–15,000 years ago) and Indo-European (approximately 4,000 years ago) have led to massive gene flow into the Indian subcontinent, leaving a definite mark on the genetic imprint of the Indian population (Quintana-Murci 2001). At present, the Indian population is culturally stratified into tribal groups (8.08% of the total population) and nontribal groups (about 92% of the total population) (Terreros et al. 2007; Roychoudhury et al. 2001).
Most contemporary nontribal populations are Hindus, who live in hierarchically arranged social classes known as castes. Indian society therefore predominantly revolves around the concept of the caste, which is a conglomerate of various sociocultural customs, traditions, and barriers that have created a large number of hierarchically arranged endogamous groups (Karve 1961, 1968). This social hierarchy system is unique, as birth of an individual in this system governs and decides most of the proceedings of his or her life, including the choice of a mating partner. The caste system was established for social and economic organization, in which four castes were ranked in status from low to high: Shudra (menial labor class), Vysya (business class), Kshatriya (warrior class), and Brahmin (priestly class). It has been postulated that the Indo-Aryans established this Hindu caste hierarchy to legitimize and maintain their power over the native Dravidian-speaking populations (Poliakov 1974; Puppala and Crawford 1996). It is also plausible that these Caucasian immigrants appointed themselves predominantly to castes of higher ranks.
The population structuring triggered by the caste system is also supplemented by an additional level of endogamy called surname endogamy, in which individuals marry others with the same surname as their own. Among Brahmins for instance, Bhargavas and Chaturvedis are two prominent subgroups that practice surname endogamy. Both groups have marriages only within their own surname and not with other Brahmins (Agrawal et al. 2003). In addition, Muslims, the second largest nontribal population of India (about 12% of the total Indian population), do not follow the Hindu caste system but still prefer to marry within their own sects, that is, Sunnis and Shia (Terreros et al. 2007). Both sects also practice a high level of consanguinity.
Overall, the many migratory events caused an extensive range of genetic diversity, and stringent and austere sociocultural barriers structured the genetic diversity into different endogamous groups. These processes endowed an exclusive [End Page 272] genetic profile to Indian populations. Earlier, we compared the maternal and paternal lineages of different North Indian groups and showed that at the mtDNA haplogroup level, North Indian Muslim populations (both Sunni and Shia) are more similar to each other and to other Hindu caste groups (Terreros et al. 2007). On the contrary, Y-chromosome analysis depicted a substantial level of African/Middle Eastern YAP lineage in the Shia population (Agrawal et al. 2005). In the present study we use a battery of STR polymorphisms to explore the genetic relationship of various endogamous North Indian populations.
The nine endogamous populations analyzed in the present study belong to three different socioreligious groups: upper-caste Hindus (Bhargavas, Chaturvedis, and Brahmins), middle-caste Hindus (Kayastha, Mathur, Rastogi, and Vaish), and Muslims (Shia and Sunni).We aim to determine the effects of the stringent social regulations on the gene pool of contemporary Indian populations and thereby to offer a fresh and immaculate interpretation on the pattern, distribution, and structuring of genetic variation in North Indian populations.
Materials and Methods
We first genotyped 20 tetranucleotide STR markers among 1,800 North Indian individuals from 9 endogamous populations: Bhargavas, Chaturvedis, Brahmins, Kayastha, Mathurs, Rastogies, Vaish, Shiites, and Sunnis. The generated genotypic profile was used to calculate average observed heterozygosity for quantifying intrapopulation genetic variation, and Wright’s F statistic was used to infer interpopulation (or intergroup) genetic variation. To provide a more comprehensive picture of the genetic similarity of and differences between different North Indian populations and sociocultural groups, we calculated genetic distances on the basis of the allelic profile of 20 STR markers and we approached the phylogenetic assessment using two algorithms: neighbor-joining (distancebased) and maximum-likelihood (allele-frequency-based).
The nine populations selected in the present study included three endogamous Hindu upper-caste populations, four middle-caste populations, and two consanguineous sects of Muslims. The three upper-caste populations include Brahmins and their two subsects, Bhargavas and Chaturvedis. All three groups are exceedingly stringent in their marital patterns. Bhargavas and Chaturvedis practice surname endogamy, which prevents them from marrying outside their own surnames. The middle-caste populations include Kayasthas, Mathurs, Rastogies, and Vaish. Mathurs are one of the subsects of Kayasthas and practice surname endogamy. Similarly, Rastogies are historically considered to have originated from the Vaish. Among the two Muslim groups selected, one is the minority Muslim sect (Shia) and the other is the predominant Muslim group (Sunni), whose descendants have ruled the Indian subcontinent for several hundred years. Both Muslim sects practice a high degree of consanguinity. All the populations are ethnically Caucasians and speakers of Indo-European languages (Hindi and Urdu). [End Page 273]
A total of 1,800 randomly selected individuals belonging to these nine populations were collected from different regions of the province of Uttar Pradesh. The collection sites included districts of Lucknow, Kanpur, Raebareilly, Barabanki, Faizabad, Agra, Jhansi, Gonda, and Basti. Two hundred samples were collected from each of the nine populations. All the study subjects were adults (mean age, 38.8 ± 3.4 years) whose families have been residents of Uttar Pradesh for the last three generations. Before the samples were collected, regional addresses and detailed computerized lists of the populations were prepared. Random numbers were generated with the help of a computer, and adult individuals living in different parts of the province were questioned about their ethnicity, caste affiliation, surname, and birthplaces of their parents. Only unrelated subjects were considered eligible to participate in the study. The demographic profile and other ethnic and familial information were filed in a detailed consent form. Three-generation pedigree charts were prepared to ensure unrelatedness of all the subjects.
Whole blood was obtained by venipuncture, and about 5 ml of blood was collected in vacutainer tubes containing EDTA. The study was performed with the approval of the institutional ethical reviewing committee of the Sanjay Gandhi Post Graduate Institute of Medical Sciences (SGPGIMS), Lucknow.
Genomic DNA Extraction
High-molecular-weight genomic DNA was extracted using the salting-out method with phenol-chloroform, as described by Comey et al. (1993), and was purified through ethanol precipitation.
A panel of 20 STR markers (FGA, D5S818, D7S820, D11S2010, D13S767, D9S926, D2S1328, D18S848, D14S306, D3S1358, ACPP, TPO, THO1, VWA, FES, F13A1, D16S310, DHFRP2, HPRT, and D4S243) were genotyped using PCR-based locus-specific amplification, as previously described (Perez-Lezaun et al. 1997). We have successfully used the same panel of 20 STR markers to evaluate the genetic relationships of North Indian populations with other world and Indian populations (Khan et al. 2007). One of the primers for each marker was labeled with a fluorochrome. Size fractionation of the fluorochromelabeled amplicons was carried out by means of capillary electrophoresis in an ABI-310 automated fragment size genetic analyzer (Applied Biosystems, Foster City, California). Size calling of the alleles at individual loci was done with Gene-Scan, version 3.1.2, and Genotyper, version 2.5.2, using a 500-ROX size standard (Applied Biosystems).
Allele frequencies at each of the markers were obtained by means of the direct counting method. Deviation from the assumption of Hardy- Weinberg equilibrium at genotypic frequencies for all markerswas estimated using Fisher’s exact test based on 1,000 Markov chain algorithm steps in Arlequin, version 2. A Bonferroni correction to the p value was applied. Two parameters (gene diversity and observed heterozygosity) were calculated to infer intrapopulation diversity. Interpopulation (and intragroup) genetic variation was assessed using the [End Page 274] measure of portioning genetic diversity, FST. Arlequin, version 2, Popgene, version 32, and Cervus, version 1, software were used for the calculations.
Two different genetic distances, Nei’s DA and FST, based on the allele frequency distribution of 20 STRs, were calculated using the GenDist option in PHYLIP, version 3.5c (Felsenstein 1993), to assess the genetic relationship of the nine North Indian populations. Phylogenetic analysis was carried out using two enrooted radial phylograms, neighbor-joining and maximum-likelihood phylogenetic trees. The neighbor-joining algorithm was used to construct the branching array from a matrix of Nei’s DA genetic distances using the Neighbor option in PHYLIP. The maximum-likelihood algorithm was used on the allele frequency distribution of 20 STR loci in the studied population using the CONTML option in PHYLIP. In both the neighbor-joining and maximum-likelihood methods, statistical bootstrap involving 1,000 replicates was carried out using the SeqQBoot option in PHYLIP. Finally, a consensus of 1,000 trees (both neighbor-joining and maximum-likelihood) was drawn using the ConSense option of PHYLIP.
Allele Frequency Distribution
We observed 176 alleles at 20 STR loci. The observed number of alleles ranged from 7 to 9 at 16 STR loci (THO1, TPO, FES, VWA, D4S243, DHFRP2, FGA, D7S820, D5S818, D11S2010, D2S1328, ACPP, D9S926, D13S1358, D14S306, and D18S848), whereas 10–13 alleles were observed in the remaining STRs (D3S1358, D16S310, HPRT, and F13A). The average number of alleles observed was 8.8, indicating a high level of polymorphism across these STR loci in North Indian populations. The maximum number of 164 alleles was observed in Shiites, and a minimum of 146 alleles was observed in the Vaish. Multilocus genotype frequencies for all nine populations revealed no significant departures from Hardy-Weinberg equilibrium when a Bonferroni correction was applied to the p values.
Intrapopulation Genetic Variation
Average observed heterozygosity was estimated as the measure of intrapopulation genetic diversity. All nine populations had a high value of average observed heterozygosity (0.742±0.06). Shia Muslims were most heterozygous (0.754 ± 0.04), and Bhargavas were the least heterozygous (0.735 ± 0.08). Locus-wise average observed heterozygosity in the studied nine populations is shown in Table 1.
Interpopulation (Intergroup) Genetic Variation
Interpopulation genetic variation corresponds to an analysis of population differentiation. In the present study, a measure of portioning genetic diversity (FST) was used for the intrapopulation genetic variation analysis. The average FST value over all loci was as low as 0.0084, suggesting less differentiation among the studied populations (Table 2). However, when the analysis was carried out between populations of each group (upper-caste, middle-caste, and Muslim groups), the FST value between the two [End Page 275]
Average Observed Heterozygosity Calculated for Nine North Indian Populations Based on the Genotype Data for 20 STR Loci
[End Page 276]
Analysis of FST Based on 20 STR Loci
|Groupa||Number of Populations (n)||FSTValue|
|Upper- and middle-caste populations||7||0.0060|
|Upper-caste and Muslim populations||5||0.0050|
|Middle-caste and Muslim populations||6||0.0092|
|Upper-caste, middle-caste, and Muslim populations||9||0.0084|
a. Upper-caste populations: Bhargavas, Chaturvedis, and Brahmins; middle-caste populations: Kayasthas, Mathurs, Rastogies, and Vaish; Muslims: Sunnis and Shiites.
Muslim sects (0.0033) or between the three upper-caste Brahmin populations (0.0038) was significantly lower than the FST value for the four middle-caste populations (0.0058). Furthermore, the level of differentiation (FST) increased when we included the populations belonging to different sociocultural groups together; the FST value between upper-caste Brahmins and Muslims was 0.0060, and for all nine populations it further increased to 0.0084.
Estimation of Genetic Distances
To assess the genetic relationship of the nine North Indian populations, we calculated pairwise genetic distances based on the multilocus genotypic data of 20 STR markers using Nei’s DA and an FST-based distance approach. The distance matrices generated from these two methods (Table 3) depict a similar picture. The two Muslim populations are genetically more similar to each other (FST = 0.0065; DA = 0.0186) than to other populations. Among middle-caste populations, the Kayasthas and Mathurs (FST = 0.0051; DA = 0.0149) and the Rastogies and Vaish (FST = 0.0040; DA = 0.0113) are genetically closer to each other. Among the three Brahmin populations, Chaturvedis are more distant from Bhargavas (FST = 0.0064; DA = 0.0197) and closer to Brahmins (FST = 0.0051; DA = 0.0148). Interestingly, the genetic distance between the middle castes and Muslims (FST = 0.0090; DA = 0.0266) was significantly higher than the distance between Muslims and upper-caste Brahmin subgroups (FST = 0.0050; DA = 0.0148), as shown in Table 4. Among Muslims, Shiites show a significantly higher distance from all caste populations compared to Sunni Muslims.
The phylogenetic analysis is depicted by two enrooted radial phylograms (neighbor-joining and maximum-likelihood), as shown in Figure 1. The scores next to the nodes characterize the number of bootstrap replicates (out of 1,000) exhibiting these specific bifurcations.
Both phylogenetic trees (DA-based neighbor-joining and maximumlikelihood trees) show nearly similar basal cluster patterns in which three clusters, [End Page 277]
Genetic Distance Matrix (Nei’s DA Below the Diagonal and FST Above the Diagonal) for Different North Indian Populations and Sociocultural Groups
[End Page 278]
Genetic Distance Matrix (Nei’s DA Below the Diagonal and FST Above the Diagonal) for Different North Indian Populations and Sociocultural Groups
|Upper Caste||Middle Caste||Muslims|
corresponding to Muslims and upper- and middle-caste groups, can be recognized. However, the clusters of Muslims and upper-caste populations have been rooted from the same branch. It can be deduced from the phylogenetic analysis that Muslims have more genetic similarity with the upper-caste populations, a fact that is supported by more than 90% bootstrap values in both of the phylograms. The middle-caste population cluster is further bifurcated into two branches: one carrying Kayasthas and Mathurs and the other carrying Rastogies and the Vaish. The Rastogy-Vaish branch is also supported by a strong bootstrap value (97.7% in the neighbor-joining phylogram; 96.5% in the maximum-likelihood phylogram).
Short tandem repeats (STRs) are some of the most polymorphic markers reported to date (Destro-Bisol et al. 2000), and together they are considered the
Click for larger view
View full resolution
[End Page 279]
most powerful genetic system to infer and detect the pattern and distribution of genetic diversity in human populations, all because of their exceptionally high mutation rates and high level of polymorphism (Jorde and Wooding 2004). The present study, based on the high-resolution analysis of a group of 20 STR markers in nine populations, reveals vital information about the amount, pattern, and distribution of genetic diversity among the population of the Northern Indian state of Uttar Pradesh. The study also provides valuable data to understand the effect of sociocultural barriers on the genetic makeup of North Indians. An elite historical, demographic, and sociocultural contour makes Indian populations an ideal candidate for the study of genetic variation and differentiation. To test the hypothesis of social cleavage resulting in genetic structuring within a confined geographic area, we estimated four parameters: intrapopulation genetic variation, interpopulation genetic differentiation, genetic distances between different populations and sociocultural groups, and phylogenetic assessments based on the allele frequency data of 20 STR markers.
Amount and Pattern of Genetic Variation
The data generated in the present study show a high degree of polymorphism and high rate of heterozygosity. We observed 176 alleles at all 20 STR loci. A worldwide panel of 16 populations analyzed for the same set of 20 STRs has shown the presence of 215 alleles in Africans, 208 alleles in Europeans, and 165, 154, and 148 alleles in East Asians, South Americans, and population groups of the Pacific, respectively (Perez-Lezaun et al. 1997). The large range of different allelic states at each of the loci analyzed is indicative of enormous genetic diversity present in North Indian populations.
The observed heterozygosity was measured as an index of diversity. Most of the STR markers are known to possess a heterozygosity level of more than 60%, which makes them highly suitable to study the apportionment of genetic diversity across different human groups (Barbujani et al. 1997; Rosenberg et al. 2002). Most of the studies carried out in the last decade, based on different numbers of autosomal STR loci, have shown high levels of genetic diversity among African populations (Calafell et al. 1998; Tishkoff and Kidd 2004). Furthermore, it has been shown that non-Africans carry only a fraction of the diversity found in Africans (Calafell et al. 1998), with the notable exception of Indian populations, which are reported to harbor more genetic diversity than any contemporary population other than Africans (Roychudhury et al. 2001; Khan et al. 2003, 2007; Agrawal and Khan 2005). All nine North Indian populations in the present study have a high range of heterozygosity, ranging from 75.4% in Shia Muslims to 73.5% in Bhargavas (see Table 1). Another study based on 15 STR markers has also shown that geographically North Indians and ethnically Caucasians exhibit a maximum average observed heterozygosity (Kashyap et al. 2004).
The existence of high genetic diversity implies that the concerned population is an ancestral population that has maintained a larger effective population size and has had a long existence that allowed mutation and recombination to increase the [End Page 280] level of heterozygosity (Kidd et al. 2004). On the other hand, a second possibility is that the colossal gene flowfrom different corners of theworld has created immense genetic diversity in this population (Majumder 1998). Among contemporaryworld populations, the high degree of genetic diversity in Africans could be the result of being the oldest human population (Cavalli-Sforza and Feldman 2003). However, a long and in-depth period of human survival and constant gene flowfrom different parts of Asia and Europe in the recent past could be a possible explanation for the high level of genetic diversity among the nine studied populations. Interestingly, similar values of heterozygosity in all nine population suggest uniform evolution and constant gene flow between the populations, despite the stern social norms governing marital choice.
Role of Sociocultural Barriers on Genetic Differentiation
The hypothesis of social cleavage resulting in genetic structuring was tested using Wright’s FST estimates. The selected populations in the present study represent three different barriers to gene flow: caste system, surname endogamy, and consanguinity. Historically, caste has been an important determinant of access to education, occupational opportunity, and marital choice. Marriage between partners of equal status is preferred, and reproduction in the caste system is largely endogamous (Heinz 1999). Both ethnographic and genetic evidence show that Hindu castes have been highly endogamous for a considerable length of time (Bamshad et al. 2001; Karve 1968; Misra 2001). Although the level of genetic differentiation between castes is relatively small, genetic distances observed in several studies suggest that gene flow is limited (Bamshad et al. 2001; Bhattacharyya et al. 1999; Dutta et al. 2002; Lakshmi et al. 2002). Furthermore, the concept of surname endogamy is a more stringent barrier that acts within a caste and is strictly practiced by the strictest higher caste group (Brahmins). Last, Muslims of India, like Muslims of any other place, including the Middle East and Pakistan and some South Indian caste groups, practice a high level of consanguinity; that is, they marry within their own family, except for real siblings.
Wright’s FST-based analysis, which is often regarded as the best method to deduce genetic differentiation between different population groups, further consolidates the findings of the heterozygosity estimates. The low value of FST, 0.0084, indicates a lesser degree of genetic differentiation between the nine studied populations, either because of the presence of the same recent common ancestor or because of heavy gene flow between the population groups in the past. The results are in congruence with other published reports on populations from Orissa (Sahoo and Kashyap 2005), Karnataka (Rajkumar and Kashyap 2004), and Bihar (Ashma and Kashyap 2003). However, it has been shown that the FST value can be affected by the nature of the polymorphism studied; high heterozygosity estimates at multi-allelic STR loci often result in diminution of the FST values (Tishkoff and Kidd 2004). Interestingly, some other studies have elucidated that FST variation also depends on how the human populations are divided. For example, when three Old World populations (sub-Saharan Africans, Europeans, and East Asians) were [End Page 281] analyzed, the within-group component of FST was 13–14% at 60 STR loci, but when South Indian populations were added as the fourth group, the FST decreased to 10% (Jorde and Wooding 2004). This demonstrates the presence of an admixed genetic profile of Indian populations.
Our findings suggest that the amount of interpopulation genetic variation in a socioreligious group is minimal compared to variance between groups. Therefore the three groups differ from each other at the genetic level because of sociocultural structuring. Still, the genetic profile of all nine populations included in the analysis exhibits extensive genetic overlap, either because the populations share the same common recent ancestor or because the caste system is quite recent (3,000–4,000 years old) (Balakrishnan 1978; Roychaudhury et al. 2001) and the time period is significantly small to create the genetic differentiation. Furthermore, the spread of Muslim population groups in India is ascribed to heavy admixture with local caste populations (Mukherjee et al. 2001). The most distinct finding was observed when we structured the FST analysis in accordance with the sociocultural status of the populations. The analysis deciphered that, despite high genetic overlap observed in all nine populations, populations of the same group (upper-caste Hindus or middlecaste Hindus or Muslims) are genetically less differentiated from other populations in their own group, and when we include populations from different sociocultural groups, the value of FST increases (see Table 2).
Genetic Relationship of Populations from Three Sociocultural Strata
The calculation of two genetic distance matrices and phylogenetic assessment based on two enrooted radial phylograms (neighbor-joining and maximum-likelihood) corroborated the findings of the genetic differentiation analysis. Two genetic distance methods (Nei’s DA and FST) have been used to overcome the ascertainment bias resulting from hypermutability of STR loci (Bowcock et al. 1994). The common evolutionary cause attributable to genetic fission of two populations is random genetic drift, which is expected to result in high frequencies of a haplotype or an allele or nucleotide motifs in the daughter populations that were infrequent in the parental populations. However, in the STR-based analysis, mutation is another important factor because of the high germline mutation rates of autosomal STR loci (3 × 10–3) (Weber and Wong 1993). One of the distance calculating methods, Nei’s DA, includes both mutation- and drift-based assumptions, whereas FST is based exclusively on the change of frequency profile resulting from genetic drift.
The most distinct observation that emerged from the genetic distance estimation was the high genetic similarity of upper-caste Brahmin populations to Muslim sects (see Tables 3 and 4). Both distance estimates revealed a genetic distance between middle-caste Hindus and Muslims (FST = 0.0090; DA = 0.0266) that was significantly higher than that between Muslims and upper-caste Brahmin subgroups (FST = 0.0050; DA = 0.0148). The greater affinity of the Muslim groups with the upper castes of northern India points to the common source of origin. The argument is more logical in light of the belief that when pastoral nomads from Central Asia reached India 3,000–8,000 years ago, they brought with [End Page 282] them the Indo-European language family (Renfrew 1989; Quintana-Murci et al. 2001) and established the Hindu caste hierarchy to legitimize their power by appointing themselves to higher caste ranks (Poliakov 1974; Cavalli-Sforza 1997). On the other hand, the possible place of origin of Indian Muslims also lies in the Caucasus, ranging from the Middle East (Saudi Arabia, Syria, Iraq) to northwest Asia (Turkey), central-west Asia (Afghanistan and Iran), and some parts of eastern Europe (Uzbekistan) (Farah 2003).
Islamic settlements in India have been attributed to at least three different movements that began in different geographic regions (Farah 2003): (1) an Arab invasion that led to the creation of the Sind state in the Indus valley in a.d. 711 (Keay 2000); (2) multiple invasions from Central Asian (Turkic) Muslims into the northwest province of Punjab between a.d. 997 and 1027; and (3) the arrival of Afghan and Persian Muslims to North India between 1300 and 1400, which later spread throughout India (Wolpert 1991). Furthermore, the evolution of these groups to the contemporary Muslim population may have been the result of various distinct cultural routes, including cultural diffusion, colonization, and elite dominance through military expansions. These distinct modes may have contributed to the varying levels of genetic admixture with the indigenous Indian groups. Alternatively, it is also possible that the high-caste Hindus had a much greater opportunity to admix with Muslim foreigners during their expansion across the Indian subcontinent. It is also interesting to note that in our recent report we showed that, when different caste groups of North India were compared to Eurasians and proto- Asian populations (mainly East Asians), the affinity with Eurasians was proportionate to caste rank, with the upper-caste Hindus and Muslims being genetically more similar to the Eurasians than the middle-caste populations were (Khan et al. 2007).
Another interesting finding is the differential genetic relationship of the two Muslim population groups with the rest of the caste populations. Shia Muslims show nearly double the genetic distance compared to the seven caste populations and to Sunni Muslims in both distance matrices. Sunni Muslims constitute the major sect of Muslims in India and make up 87% of the total Muslim population. They have ruled different parts of India for about 900 years, and during their reign, they expanded within the subcontinent. Despite practicing consanguinity, they married outside their religion. This is supported by various studies showing that Indian Muslims have high genetic similarity with other Hindu caste populations (Mukherjee et al. 2001; Rajkumar and Kashyap 2004). On the contrary, Shia Muslims, because of their smaller numbers, may have remained more culturally and genetically isolated within their communities.
We did not observe an effect of surname endogamy on the genetic relationship of populations, as both Bhargavas and Chaturvedis were genetically closer to the parent Brahmin populations. Similarly, among the middle-caste populations, the surname endogamy practicing population of Mathurs shows significant genetic similarity to the Kayastha major group.We reported similar findings based on another set of 24 STR markers (Agrawal et al. 2003) and HLA class II loci (Agrawal [End Page 283] et al. 2001), finding that surname endogamy did not have a significant effect on population differentiation. Among the four middle-caste populations, Rastogies and the Vaish showed the least genetic distance, whereas Mathurs and Kayasthas showed nearly double the genetic distance from the Vaish and Rastogy populations in comparison to each other. The findings support social structuring of these populations. Traditionally, the Kayastha have been divided into 12 endogamous subcastes. One of these 12 subcastes is the Mathurs (Karve 1968). Similarly, it has been claimed that the Rastogies originated from either the Vaish or the Rajput population (Balakrishnan 1978). The phylogenetic analysis based on two approaches [genetic distance (neighbor-joining) and maximum-likelihood model] (see Figure 1) also reveals a patristic separation of Muslim, upper-caste, and middle-caste populations supported by high bootstrap values. Muslims and Brahmin populations clustered on the same branch, revealing more genetic similarity to each other than to the four middle-caste populations. The middle-caste cluster was bifurcated into Kayastha-Mathur and Rastogy-Vaish subclusters, bolstering the view that a significant genetic differentiation among middle-caste populations occurred in comparison to the upper-caste Hindu or Muslim groups.
In conclusion, the analysis of 20 STR loci in 9 endogamous populations of North India has unwrapped a set of data about the genetic profile of North Indians and the role of different sociocultural factors in their structuring and differentiation. First, the role of extensive gene flow through a series of migrations and invasions created an enormous amount of genetic diversity, which marks a high level of intrapopulation genetic variation. Second, although interpopulation differences were minimal, a slight pattern of genetic variation distribution occurred in which different populations structured into socioreligious groups are genetically more similar to populations of the same group and are genetically more distant from populations of other groups [e.g., FST between Shiites and Sunnis is 0.0033 but that between Muslims as a single group and upper-caste populations (FST = 0.005) or middle-caste populations (FST=0.009) is higher]. Third, there is a differential genetic relationship of North Indian Muslims with upper- and middle-caste populations.
Overall, the study shows that the marriage regulatory caste system has a minimal effect on controlling gene flow across socioreligious boundaries; probably the institution of the caste system is still too young to create significant genetic differentiation among different populations. However, if the caste system persists and is not affected by modernization and urbanization, the signatures of endogamy may appear after a few thousand years. Still, the genetic configuration of Indians is as complex as the history of the Indian subcontinent, interwoven with numerous threads of unknown facts. More genetic data are necessary to unravel the structure of Indian genetic composition and to ascertain that the outcome of the present study is general rather than exclusive.
2 Faculty of Medicine, University of Calgary, Calgary, AB T2N4N1, Canada.
We are thankful to the Indian Council of Medical Research (ICMR), New Delhi, and the Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow, [End Page 284] for providing various laboratory facilities and other assistance for conducting the present study.