publisher colophon

Indian populations possess an exclusive genetic profile primarily due to the many migratory events, which caused an extensive range of genetic diversity, and also due to stringent and austere sociocultural barriers that structure these populations into different endogamous groups. In the present study we attempt to explore the genetic relationships between various endogamous North Indian populations and to determine the effect of stringent social regulations on their gene pool. Twenty STR markers were genotyped in 1,800 random North Indians from 9 endogamous populations belonging to upper-caste and middle-caste Hindus and Muslims. All nine populations had high allelic diversity (176 alleles) and average observed heterozygosity (0.742 ± 0.06), suggesting strong intrapopulation diversity. The average FST value over all loci was as low as 0.0084. However, within-group FST and genetic distance analysis showed that populations of the same group were genetically closer to each other. The genetic distance of Muslims from middle castes (FST = 0.0090; DA = 0.0266) was significantly higher than that of Muslims from upper castes (FST = 0.0050; DA = 0.0148). Phylogenetic trees (neighbor-joining and maximum-likelihood) show the basal cluster pattern of three clusters corresponding to Muslims, upper-caste, and middle-caste populations, with Muslims clustered with upper-caste populations. Based on the results, we conclude that the extensive gene flow through a series of migrations and invasions has created an enormous amount of genetic diversity. The interpopulation differences are minimal but have a definite pattern, in which populations of different socioreligious groups have more genetic similarity within the same group and are genetically more distant from populations of other groups. Finally, North Indian Muslims show a differential genetic relationship with upper- and middle-caste populations.


Short Tandem Repeats (Strs), North India, Castes, Muslims, Hindus, Phylogenetic Reconstruction, Neighbor-Joining Tree, Surname Endogamy, Fga, D5s818, D7s820, D11s2010, D13s767, D9s926, D2s1328, D18s848, D14s306, D3s1358, Acpp, Tpo, Tho1, Vwa, Fes, F13a1, D16s310, Dhfrp2, Hprt, D4s243

One endeavor of the analysis of human genetic variation is to determine the amount, pattern, distribution, and structuring of genetic diversity across different geo-ethnic, [End Page 271] sociocultural, and linguistic human groups (Cavalli-Sforza 2005; Bamshad et al. 2003). In this context, the genetic structure, affinity, and diversity of Indian populations is often contested and postulated to hold an important key to the range of factors that shaped the contemporary pattern of genetic variation.

The Indian subcontinent is an assemblage of more than 1 billion individuals, constituting about one-fifth of the total world population. A long and in-depth period of human survival, colossal gene flow from different corners of the world, and unique sociocultural practices have created immense ethnic, morphological, linguistic, and religious diversity among the contemporary human groups of India. Two Neolithic episodes of migration of the speakers of Elamo-Dravidian (approximately 10,000–15,000 years ago) and Indo-European (approximately 4,000 years ago) have led to massive gene flow into the Indian subcontinent, leaving a definite mark on the genetic imprint of the Indian population (Quintana-Murci 2001). At present, the Indian population is culturally stratified into tribal groups (8.08% of the total population) and nontribal groups (about 92% of the total population) (Terreros et al. 2007; Roychoudhury et al. 2001).

Most contemporary nontribal populations are Hindus, who live in hierarchically arranged social classes known as castes. Indian society therefore predominantly revolves around the concept of the caste, which is a conglomerate of various sociocultural customs, traditions, and barriers that have created a large number of hierarchically arranged endogamous groups (Karve 1961, 1968). This social hierarchy system is unique, as birth of an individual in this system governs and decides most of the proceedings of his or her life, including the choice of a mating partner. The caste system was established for social and economic organization, in which four castes were ranked in status from low to high: Shudra (menial labor class), Vysya (business class), Kshatriya (warrior class), and Brahmin (priestly class). It has been postulated that the Indo-Aryans established this Hindu caste hierarchy to legitimize and maintain their power over the native Dravidian-speaking populations (Poliakov 1974; Puppala and Crawford 1996). It is also plausible that these Caucasian immigrants appointed themselves predominantly to castes of higher ranks.

The population structuring triggered by the caste system is also supplemented by an additional level of endogamy called surname endogamy, in which individuals marry others with the same surname as their own. Among Brahmins for instance, Bhargavas and Chaturvedis are two prominent subgroups that practice surname endogamy. Both groups have marriages only within their own surname and not with other Brahmins (Agrawal et al. 2003). In addition, Muslims, the second largest nontribal population of India (about 12% of the total Indian population), do not follow the Hindu caste system but still prefer to marry within their own sects, that is, Sunnis and Shia (Terreros et al. 2007). Both sects also practice a high level of consanguinity.

Overall, the many migratory events caused an extensive range of genetic diversity, and stringent and austere sociocultural barriers structured the genetic diversity into different endogamous groups. These processes endowed an exclusive [End Page 272] genetic profile to Indian populations. Earlier, we compared the maternal and paternal lineages of different North Indian groups and showed that at the mtDNA haplogroup level, North Indian Muslim populations (both Sunni and Shia) are more similar to each other and to other Hindu caste groups (Terreros et al. 2007). On the contrary, Y-chromosome analysis depicted a substantial level of African/Middle Eastern YAP lineage in the Shia population (Agrawal et al. 2005). In the present study we use a battery of STR polymorphisms to explore the genetic relationship of various endogamous North Indian populations.

The nine endogamous populations analyzed in the present study belong to three different socioreligious groups: upper-caste Hindus (Bhargavas, Chaturvedis, and Brahmins), middle-caste Hindus (Kayastha, Mathur, Rastogi, and Vaish), and Muslims (Shia and Sunni).We aim to determine the effects of the stringent social regulations on the gene pool of contemporary Indian populations and thereby to offer a fresh and immaculate interpretation on the pattern, distribution, and structuring of genetic variation in North Indian populations.

Materials and Methods

Research Design

We first genotyped 20 tetranucleotide STR markers among 1,800 North Indian individuals from 9 endogamous populations: Bhargavas, Chaturvedis, Brahmins, Kayastha, Mathurs, Rastogies, Vaish, Shiites, and Sunnis. The generated genotypic profile was used to calculate average observed heterozygosity for quantifying intrapopulation genetic variation, and Wright’s F statistic was used to infer interpopulation (or intergroup) genetic variation. To provide a more comprehensive picture of the genetic similarity of and differences between different North Indian populations and sociocultural groups, we calculated genetic distances on the basis of the allelic profile of 20 STR markers and we approached the phylogenetic assessment using two algorithms: neighbor-joining (distancebased) and maximum-likelihood (allele-frequency-based).


The nine populations selected in the present study included three endogamous Hindu upper-caste populations, four middle-caste populations, and two consanguineous sects of Muslims. The three upper-caste populations include Brahmins and their two subsects, Bhargavas and Chaturvedis. All three groups are exceedingly stringent in their marital patterns. Bhargavas and Chaturvedis practice surname endogamy, which prevents them from marrying outside their own surnames. The middle-caste populations include Kayasthas, Mathurs, Rastogies, and Vaish. Mathurs are one of the subsects of Kayasthas and practice surname endogamy. Similarly, Rastogies are historically considered to have originated from the Vaish. Among the two Muslim groups selected, one is the minority Muslim sect (Shia) and the other is the predominant Muslim group (Sunni), whose descendants have ruled the Indian subcontinent for several hundred years. Both Muslim sects practice a high degree of consanguinity. All the populations are ethnically Caucasians and speakers of Indo-European languages (Hindi and Urdu). [End Page 273]

A total of 1,800 randomly selected individuals belonging to these nine populations were collected from different regions of the province of Uttar Pradesh. The collection sites included districts of Lucknow, Kanpur, Raebareilly, Barabanki, Faizabad, Agra, Jhansi, Gonda, and Basti. Two hundred samples were collected from each of the nine populations. All the study subjects were adults (mean age, 38.8 ± 3.4 years) whose families have been residents of Uttar Pradesh for the last three generations. Before the samples were collected, regional addresses and detailed computerized lists of the populations were prepared. Random numbers were generated with the help of a computer, and adult individuals living in different parts of the province were questioned about their ethnicity, caste affiliation, surname, and birthplaces of their parents. Only unrelated subjects were considered eligible to participate in the study. The demographic profile and other ethnic and familial information were filed in a detailed consent form. Three-generation pedigree charts were prepared to ensure unrelatedness of all the subjects.

Whole blood was obtained by venipuncture, and about 5 ml of blood was collected in vacutainer tubes containing EDTA. The study was performed with the approval of the institutional ethical reviewing committee of the Sanjay Gandhi Post Graduate Institute of Medical Sciences (SGPGIMS), Lucknow.

Genomic DNA Extraction

High-molecular-weight genomic DNA was extracted using the salting-out method with phenol-chloroform, as described by Comey et al. (1993), and was purified through ethanol precipitation.

STR Genotyping

A panel of 20 STR markers (FGA, D5S818, D7S820, D11S2010, D13S767, D9S926, D2S1328, D18S848, D14S306, D3S1358, ACPP, TPO, THO1, VWA, FES, F13A1, D16S310, DHFRP2, HPRT, and D4S243) were genotyped using PCR-based locus-specific amplification, as previously described (Perez-Lezaun et al. 1997). We have successfully used the same panel of 20 STR markers to evaluate the genetic relationships of North Indian populations with other world and Indian populations (Khan et al. 2007). One of the primers for each marker was labeled with a fluorochrome. Size fractionation of the fluorochromelabeled amplicons was carried out by means of capillary electrophoresis in an ABI-310 automated fragment size genetic analyzer (Applied Biosystems, Foster City, California). Size calling of the alleles at individual loci was done with Gene-Scan, version 3.1.2, and Genotyper, version 2.5.2, using a 500-ROX size standard (Applied Biosystems).

Statistical Analysis

Allele frequencies at each of the markers were obtained by means of the direct counting method. Deviation from the assumption of Hardy- Weinberg equilibrium at genotypic frequencies for all markerswas estimated using Fisher’s exact test based on 1,000 Markov chain algorithm steps in Arlequin, version 2. A Bonferroni correction to the p value was applied. Two parameters (gene diversity and observed heterozygosity) were calculated to infer intrapopulation diversity. Interpopulation (and intragroup) genetic variation was assessed using the [End Page 274] measure of portioning genetic diversity, FST. Arlequin, version 2, Popgene, version 32, and Cervus, version 1, software were used for the calculations.

Two different genetic distances, Nei’s DA and FST, based on the allele frequency distribution of 20 STRs, were calculated using the GenDist option in PHYLIP, version 3.5c (Felsenstein 1993), to assess the genetic relationship of the nine North Indian populations. Phylogenetic analysis was carried out using two enrooted radial phylograms, neighbor-joining and maximum-likelihood phylogenetic trees. The neighbor-joining algorithm was used to construct the branching array from a matrix of Nei’s DA genetic distances using the Neighbor option in PHYLIP. The maximum-likelihood algorithm was used on the allele frequency distribution of 20 STR loci in the studied population using the CONTML option in PHYLIP. In both the neighbor-joining and maximum-likelihood methods, statistical bootstrap involving 1,000 replicates was carried out using the SeqQBoot option in PHYLIP. Finally, a consensus of 1,000 trees (both neighbor-joining and maximum-likelihood) was drawn using the ConSense option of PHYLIP.


Allele Frequency Distribution

We observed 176 alleles at 20 STR loci. The observed number of alleles ranged from 7 to 9 at 16 STR loci (THO1, TPO, FES, VWA, D4S243, DHFRP2, FGA, D7S820, D5S818, D11S2010, D2S1328, ACPP, D9S926, D13S1358, D14S306, and D18S848), whereas 10–13 alleles were observed in the remaining STRs (D3S1358, D16S310, HPRT, and F13A). The average number of alleles observed was 8.8, indicating a high level of polymorphism across these STR loci in North Indian populations. The maximum number of 164 alleles was observed in Shiites, and a minimum of 146 alleles was observed in the Vaish. Multilocus genotype frequencies for all nine populations revealed no significant departures from Hardy-Weinberg equilibrium when a Bonferroni correction was applied to the p values.

Intrapopulation Genetic Variation

Average observed heterozygosity was estimated as the measure of intrapopulation genetic diversity. All nine populations had a high value of average observed heterozygosity (0.742±0.06). Shia Muslims were most heterozygous (0.754 ± 0.04), and Bhargavas were the least heterozygous (0.735 ± 0.08). Locus-wise average observed heterozygosity in the studied nine populations is shown in Table 1.

Interpopulation (Intergroup) Genetic Variation

Interpopulation genetic variation corresponds to an analysis of population differentiation. In the present study, a measure of portioning genetic diversity (FST) was used for the intrapopulation genetic variation analysis. The average FST value over all loci was as low as 0.0084, suggesting less differentiation among the studied populations (Table 2). However, when the analysis was carried out between populations of each group (upper-caste, middle-caste, and Muslim groups), the FST value between the two [End Page 275]

Table 1

Average Observed Heterozygosity Calculated for Nine North Indian Populations Based on the Genotype Data for 20 STR Loci

Locus Bhargavas Chaturvedis Brahmins Shiites Sunnis Rastogies Vaish Kayasthas Mathurs
HPRT 0.70 0.69 0.72 0.72 0.72 0.79 0.74 0.74 0.76 .
THO1 0.71 0.76 0.74 0.78 0.72 0.78 0.74 0.74 0.77
D3S1358 0.79 0.82 0.80 0.84 0.81 0.82 0.82 0.82 0.83
D16S310 0.78 0.77 0.75 0.75 0.74 0.77 0.78 0.79 0.82
F13A 0.6 0.65 0.66 0.79 0.70 0.76 0.69 0.78 0.76
TPO 0.74 0.75 0.74 0.76 0.78 0.76 0.78 0.76 0.79
FES 0.80 0.80 0.82 0.81 0.82 0.83 0.84 0.82 0.80
VWA 0.81 0.80 0.79 0.81 0.80 0.78 0.79 0.79 0.77
D4S243 0.73 0.72 0.76 0.75 0.73 0.77 0.78 0.77 0.68
DHFRP2 0.69 0.67 0.69 0.70 0.72 0.78 0.68 0.7 0.72
FGA 0.85 0.75 0.81 0.87 0.87 0.89 0.85 0.85 0.84
D7S820 0.79 0.76 0.73 0.80 0.73 0.76 0.76 0.81 0.84
D5S818 0.69 0.85 0.78 0.75 0.77 0.68 0.78 0.75 0.77
D11S2010 0.8 0.74 0.80 0.79 0.78 0.73 0.74 0.77 0.73
D2S1328 0.76 0.79 0.73 0.74 0.75 0.70 0.72 0.73 0.69
ACPP 0.73 0.69 0.74 0.8 0.68 0.68 0.62 0.59 0.63
D9S926 0.77 0.75 0.76 0.76 0.75 0.73 0.70 0.70 0.70
D13S767 0.67 0.65 0.59 0.63 0.68 0.67 0.68 0.65 0.66
D14S306 0.64 0.66 0.76 0.62 0.67 0.64 0.67 0.62 0.65
D18S848 0.66 0.66 0.66 0.61 0.66 0.65 0.68 0.64 0.64
Average 0.735 0.736 0.74 0.754 0.744 0.748 0.743 0.741 0.742

[End Page 276]

Table 2

Analysis of FST Based on 20 STR Loci

Groupa Number of Populations (n) FSTValue
Upper-caste populations 3 0.0038
Middle-caste populations 4 0.0058
Muslim populations 2 0.0033
Upper- and middle-caste populations 7 0.0060
Upper-caste and Muslim populations 5 0.0050
Middle-caste and Muslim populations 6 0.0092
Upper-caste, middle-caste, and Muslim populations 9 0.0084

a. Upper-caste populations: Bhargavas, Chaturvedis, and Brahmins; middle-caste populations: Kayasthas, Mathurs, Rastogies, and Vaish; Muslims: Sunnis and Shiites.

Muslim sects (0.0033) or between the three upper-caste Brahmin populations (0.0038) was significantly lower than the FST value for the four middle-caste populations (0.0058). Furthermore, the level of differentiation (FST) increased when we included the populations belonging to different sociocultural groups together; the FST value between upper-caste Brahmins and Muslims was 0.0060, and for all nine populations it further increased to 0.0084.

Estimation of Genetic Distances

To assess the genetic relationship of the nine North Indian populations, we calculated pairwise genetic distances based on the multilocus genotypic data of 20 STR markers using Nei’s DA and an FST-based distance approach. The distance matrices generated from these two methods (Table 3) depict a similar picture. The two Muslim populations are genetically more similar to each other (FST = 0.0065; DA = 0.0186) than to other populations. Among middle-caste populations, the Kayasthas and Mathurs (FST = 0.0051; DA = 0.0149) and the Rastogies and Vaish (FST = 0.0040; DA = 0.0113) are genetically closer to each other. Among the three Brahmin populations, Chaturvedis are more distant from Bhargavas (FST = 0.0064; DA = 0.0197) and closer to Brahmins (FST = 0.0051; DA = 0.0148). Interestingly, the genetic distance between the middle castes and Muslims (FST = 0.0090; DA = 0.0266) was significantly higher than the distance between Muslims and upper-caste Brahmin subgroups (FST = 0.0050; DA = 0.0148), as shown in Table 4. Among Muslims, Shiites show a significantly higher distance from all caste populations compared to Sunni Muslims.

Phylogenetic Reconstruction

The phylogenetic analysis is depicted by two enrooted radial phylograms (neighbor-joining and maximum-likelihood), as shown in Figure 1. The scores next to the nodes characterize the number of bootstrap replicates (out of 1,000) exhibiting these specific bifurcations.

Both phylogenetic trees (DA-based neighbor-joining and maximumlikelihood trees) show nearly similar basal cluster patterns in which three clusters, [End Page 277]

Table 3

Genetic Distance Matrix (Nei’s DA Below the Diagonal and FST Above the Diagonal) for Different North Indian Populations and Sociocultural Groups

Sunnis Shiites Bhargavas Chaturvedis Brahmins Kayasthas Mathurs Rastogies Vaish
Sunnis 0.0065 0.0068 0.0067 0.0066 0.0104 0.0110 0.0091 0.0092
Shiites 0.0186 0.0094 0.0100 0.0114 0.0155 0.0149 0.0150 0.0142
Bhargavas 0.0197 0.0285 0.0064 0.0055 0.0088 0.0079 0.0080 0.0086
Chaturvedis 0.0194 0.0304 0.0197 0.0051 0.0093 0.0094 0.0101 0.0073
Brahmins 0.0192 0.0337 0.0161 0.0148 0.0091 0.0088 0.0079 0.0083
Kayasthas 0.0300 0.0462 0.0258 0.0275 0.0264 0.0051 0.0062 0.0056
Mathurs 0.0322 0.0453 0.0237 0.0286 0.0257 0.0149 0.0045 0.0042
Rastogies 0.0261 0.0445 0.0233 0.0299 0.0225 0.0179 0.0128 0.0040
Vaish 0.0265 0.0424 0.0253 0.0215 0.0239 0.0161 0.0123 0.0113

[End Page 278]

Table 4

Genetic Distance Matrix (Nei’s DA Below the Diagonal and FST Above the Diagonal) for Different North Indian Populations and Sociocultural Groups

Upper Caste Middle Caste Muslims
Upper caste 0.0059 0.0050
Middle caste 0.0144 0.0090
Muslims 0.0148 0.0266

corresponding to Muslims and upper- and middle-caste groups, can be recognized. However, the clusters of Muslims and upper-caste populations have been rooted from the same branch. It can be deduced from the phylogenetic analysis that Muslims have more genetic similarity with the upper-caste populations, a fact that is supported by more than 90% bootstrap values in both of the phylograms. The middle-caste population cluster is further bifurcated into two branches: one carrying Kayasthas and Mathurs and the other carrying Rastogies and the Vaish. The Rastogy-Vaish branch is also supported by a strong bootstrap value (97.7% in the neighbor-joining phylogram; 96.5% in the maximum-likelihood phylogram).


Short tandem repeats (STRs) are some of the most polymorphic markers reported to date (Destro-Bisol et al. 2000), and together they are considered the

Figure 1. Phylogenetic trees depicting clustering of the nine North Indian populations: (a) neighborjoining phylogram based on Nei’s DA genetic distances; (b) maximum-likelihood phylogram based on 20 STR markers.
Click for larger view
View full resolution
Figure 1.

Phylogenetic trees depicting clustering of the nine North Indian populations: (a) neighborjoining phylogram based on Nei’s DA genetic distances; (b) maximum-likelihood phylogram based on 20 STR markers.

[End Page 279]

most powerful genetic system to infer and detect the pattern and distribution of genetic diversity in human populations, all because of their exceptionally high mutation rates and high level of polymorphism (Jorde and Wooding 2004). The present study, based on the high-resolution analysis of a group of 20 STR markers in nine populations, reveals vital information about the amount, pattern, and distribution of genetic diversity among the population of the Northern Indian state of Uttar Pradesh. The study also provides valuable data to understand the effect of sociocultural barriers on the genetic makeup of North Indians. An elite historical, demographic, and sociocultural contour makes Indian populations an ideal candidate for the study of genetic variation and differentiation. To test the hypothesis of social cleavage resulting in genetic structuring within a confined geographic area, we estimated four parameters: intrapopulation genetic variation, interpopulation genetic differentiation, genetic distances between different populations and sociocultural groups, and phylogenetic assessments based on the allele frequency data of 20 STR markers.

Amount and Pattern of Genetic Variation

The data generated in the present study show a high degree of polymorphism and high rate of heterozygosity. We observed 176 alleles at all 20 STR loci. A worldwide panel of 16 populations analyzed for the same set of 20 STRs has shown the presence of 215 alleles in Africans, 208 alleles in Europeans, and 165, 154, and 148 alleles in East Asians, South Americans, and population groups of the Pacific, respectively (Perez-Lezaun et al. 1997). The large range of different allelic states at each of the loci analyzed is indicative of enormous genetic diversity present in North Indian populations.

The observed heterozygosity was measured as an index of diversity. Most of the STR markers are known to possess a heterozygosity level of more than 60%, which makes them highly suitable to study the apportionment of genetic diversity across different human groups (Barbujani et al. 1997; Rosenberg et al. 2002). Most of the studies carried out in the last decade, based on different numbers of autosomal STR loci, have shown high levels of genetic diversity among African populations (Calafell et al. 1998; Tishkoff and Kidd 2004). Furthermore, it has been shown that non-Africans carry only a fraction of the diversity found in Africans (Calafell et al. 1998), with the notable exception of Indian populations, which are reported to harbor more genetic diversity than any contemporary population other than Africans (Roychudhury et al. 2001; Khan et al. 2003, 2007; Agrawal and Khan 2005). All nine North Indian populations in the present study have a high range of heterozygosity, ranging from 75.4% in Shia Muslims to 73.5% in Bhargavas (see Table 1). Another study based on 15 STR markers has also shown that geographically North Indians and ethnically Caucasians exhibit a maximum average observed heterozygosity (Kashyap et al. 2004).

The existence of high genetic diversity implies that the concerned population is an ancestral population that has maintained a larger effective population size and has had a long existence that allowed mutation and recombination to increase the [End Page 280] level of heterozygosity (Kidd et al. 2004). On the other hand, a second possibility is that the colossal gene flowfrom different corners of theworld has created immense genetic diversity in this population (Majumder 1998). Among contemporaryworld populations, the high degree of genetic diversity in Africans could be the result of being the oldest human population (Cavalli-Sforza and Feldman 2003). However, a long and in-depth period of human survival and constant gene flowfrom different parts of Asia and Europe in the recent past could be a possible explanation for the high level of genetic diversity among the nine studied populations. Interestingly, similar values of heterozygosity in all nine population suggest uniform evolution and constant gene flow between the populations, despite the stern social norms governing marital choice.

Role of Sociocultural Barriers on Genetic Differentiation

The hypothesis of social cleavage resulting in genetic structuring was tested using Wright’s FST estimates. The selected populations in the present study represent three different barriers to gene flow: caste system, surname endogamy, and consanguinity. Historically, caste has been an important determinant of access to education, occupational opportunity, and marital choice. Marriage between partners of equal status is preferred, and reproduction in the caste system is largely endogamous (Heinz 1999). Both ethnographic and genetic evidence show that Hindu castes have been highly endogamous for a considerable length of time (Bamshad et al. 2001; Karve 1968; Misra 2001). Although the level of genetic differentiation between castes is relatively small, genetic distances observed in several studies suggest that gene flow is limited (Bamshad et al. 2001; Bhattacharyya et al. 1999; Dutta et al. 2002; Lakshmi et al. 2002). Furthermore, the concept of surname endogamy is a more stringent barrier that acts within a caste and is strictly practiced by the strictest higher caste group (Brahmins). Last, Muslims of India, like Muslims of any other place, including the Middle East and Pakistan and some South Indian caste groups, practice a high level of consanguinity; that is, they marry within their own family, except for real siblings.

Wright’s FST-based analysis, which is often regarded as the best method to deduce genetic differentiation between different population groups, further consolidates the findings of the heterozygosity estimates. The low value of FST, 0.0084, indicates a lesser degree of genetic differentiation between the nine studied populations, either because of the presence of the same recent common ancestor or because of heavy gene flow between the population groups in the past. The results are in congruence with other published reports on populations from Orissa (Sahoo and Kashyap 2005), Karnataka (Rajkumar and Kashyap 2004), and Bihar (Ashma and Kashyap 2003). However, it has been shown that the FST value can be affected by the nature of the polymorphism studied; high heterozygosity estimates at multi-allelic STR loci often result in diminution of the FST values (Tishkoff and Kidd 2004). Interestingly, some other studies have elucidated that FST variation also depends on how the human populations are divided. For example, when three Old World populations (sub-Saharan Africans, Europeans, and East Asians) were [End Page 281] analyzed, the within-group component of FST was 13–14% at 60 STR loci, but when South Indian populations were added as the fourth group, the FST decreased to 10% (Jorde and Wooding 2004). This demonstrates the presence of an admixed genetic profile of Indian populations.

Our findings suggest that the amount of interpopulation genetic variation in a socioreligious group is minimal compared to variance between groups. Therefore the three groups differ from each other at the genetic level because of sociocultural structuring. Still, the genetic profile of all nine populations included in the analysis exhibits extensive genetic overlap, either because the populations share the same common recent ancestor or because the caste system is quite recent (3,000–4,000 years old) (Balakrishnan 1978; Roychaudhury et al. 2001) and the time period is significantly small to create the genetic differentiation. Furthermore, the spread of Muslim population groups in India is ascribed to heavy admixture with local caste populations (Mukherjee et al. 2001). The most distinct finding was observed when we structured the FST analysis in accordance with the sociocultural status of the populations. The analysis deciphered that, despite high genetic overlap observed in all nine populations, populations of the same group (upper-caste Hindus or middlecaste Hindus or Muslims) are genetically less differentiated from other populations in their own group, and when we include populations from different sociocultural groups, the value of FST increases (see Table 2).

Genetic Relationship of Populations from Three Sociocultural Strata

The calculation of two genetic distance matrices and phylogenetic assessment based on two enrooted radial phylograms (neighbor-joining and maximum-likelihood) corroborated the findings of the genetic differentiation analysis. Two genetic distance methods (Nei’s DA and FST) have been used to overcome the ascertainment bias resulting from hypermutability of STR loci (Bowcock et al. 1994). The common evolutionary cause attributable to genetic fission of two populations is random genetic drift, which is expected to result in high frequencies of a haplotype or an allele or nucleotide motifs in the daughter populations that were infrequent in the parental populations. However, in the STR-based analysis, mutation is another important factor because of the high germline mutation rates of autosomal STR loci (3 × 10–3) (Weber and Wong 1993). One of the distance calculating methods, Nei’s DA, includes both mutation- and drift-based assumptions, whereas FST is based exclusively on the change of frequency profile resulting from genetic drift.

The most distinct observation that emerged from the genetic distance estimation was the high genetic similarity of upper-caste Brahmin populations to Muslim sects (see Tables 3 and 4). Both distance estimates revealed a genetic distance between middle-caste Hindus and Muslims (FST = 0.0090; DA = 0.0266) that was significantly higher than that between Muslims and upper-caste Brahmin subgroups (FST = 0.0050; DA = 0.0148). The greater affinity of the Muslim groups with the upper castes of northern India points to the common source of origin. The argument is more logical in light of the belief that when pastoral nomads from Central Asia reached India 3,000–8,000 years ago, they brought with [End Page 282] them the Indo-European language family (Renfrew 1989; Quintana-Murci et al. 2001) and established the Hindu caste hierarchy to legitimize their power by appointing themselves to higher caste ranks (Poliakov 1974; Cavalli-Sforza 1997). On the other hand, the possible place of origin of Indian Muslims also lies in the Caucasus, ranging from the Middle East (Saudi Arabia, Syria, Iraq) to northwest Asia (Turkey), central-west Asia (Afghanistan and Iran), and some parts of eastern Europe (Uzbekistan) (Farah 2003).

Islamic settlements in India have been attributed to at least three different movements that began in different geographic regions (Farah 2003): (1) an Arab invasion that led to the creation of the Sind state in the Indus valley in a.d. 711 (Keay 2000); (2) multiple invasions from Central Asian (Turkic) Muslims into the northwest province of Punjab between a.d. 997 and 1027; and (3) the arrival of Afghan and Persian Muslims to North India between 1300 and 1400, which later spread throughout India (Wolpert 1991). Furthermore, the evolution of these groups to the contemporary Muslim population may have been the result of various distinct cultural routes, including cultural diffusion, colonization, and elite dominance through military expansions. These distinct modes may have contributed to the varying levels of genetic admixture with the indigenous Indian groups. Alternatively, it is also possible that the high-caste Hindus had a much greater opportunity to admix with Muslim foreigners during their expansion across the Indian subcontinent. It is also interesting to note that in our recent report we showed that, when different caste groups of North India were compared to Eurasians and proto- Asian populations (mainly East Asians), the affinity with Eurasians was proportionate to caste rank, with the upper-caste Hindus and Muslims being genetically more similar to the Eurasians than the middle-caste populations were (Khan et al. 2007).

Another interesting finding is the differential genetic relationship of the two Muslim population groups with the rest of the caste populations. Shia Muslims show nearly double the genetic distance compared to the seven caste populations and to Sunni Muslims in both distance matrices. Sunni Muslims constitute the major sect of Muslims in India and make up 87% of the total Muslim population. They have ruled different parts of India for about 900 years, and during their reign, they expanded within the subcontinent. Despite practicing consanguinity, they married outside their religion. This is supported by various studies showing that Indian Muslims have high genetic similarity with other Hindu caste populations (Mukherjee et al. 2001; Rajkumar and Kashyap 2004). On the contrary, Shia Muslims, because of their smaller numbers, may have remained more culturally and genetically isolated within their communities.

We did not observe an effect of surname endogamy on the genetic relationship of populations, as both Bhargavas and Chaturvedis were genetically closer to the parent Brahmin populations. Similarly, among the middle-caste populations, the surname endogamy practicing population of Mathurs shows significant genetic similarity to the Kayastha major group.We reported similar findings based on another set of 24 STR markers (Agrawal et al. 2003) and HLA class II loci (Agrawal [End Page 283] et al. 2001), finding that surname endogamy did not have a significant effect on population differentiation. Among the four middle-caste populations, Rastogies and the Vaish showed the least genetic distance, whereas Mathurs and Kayasthas showed nearly double the genetic distance from the Vaish and Rastogy populations in comparison to each other. The findings support social structuring of these populations. Traditionally, the Kayastha have been divided into 12 endogamous subcastes. One of these 12 subcastes is the Mathurs (Karve 1968). Similarly, it has been claimed that the Rastogies originated from either the Vaish or the Rajput population (Balakrishnan 1978). The phylogenetic analysis based on two approaches [genetic distance (neighbor-joining) and maximum-likelihood model] (see Figure 1) also reveals a patristic separation of Muslim, upper-caste, and middle-caste populations supported by high bootstrap values. Muslims and Brahmin populations clustered on the same branch, revealing more genetic similarity to each other than to the four middle-caste populations. The middle-caste cluster was bifurcated into Kayastha-Mathur and Rastogy-Vaish subclusters, bolstering the view that a significant genetic differentiation among middle-caste populations occurred in comparison to the upper-caste Hindu or Muslim groups.

In conclusion, the analysis of 20 STR loci in 9 endogamous populations of North India has unwrapped a set of data about the genetic profile of North Indians and the role of different sociocultural factors in their structuring and differentiation. First, the role of extensive gene flow through a series of migrations and invasions created an enormous amount of genetic diversity, which marks a high level of intrapopulation genetic variation. Second, although interpopulation differences were minimal, a slight pattern of genetic variation distribution occurred in which different populations structured into socioreligious groups are genetically more similar to populations of the same group and are genetically more distant from populations of other groups [e.g., FST between Shiites and Sunnis is 0.0033 but that between Muslims as a single group and upper-caste populations (FST = 0.005) or middle-caste populations (FST=0.009) is higher]. Third, there is a differential genetic relationship of North Indian Muslims with upper- and middle-caste populations.

Overall, the study shows that the marriage regulatory caste system has a minimal effect on controlling gene flow across socioreligious boundaries; probably the institution of the caste system is still too young to create significant genetic differentiation among different populations. However, if the caste system persists and is not affected by modernization and urbanization, the signatures of endogamy may appear after a few thousand years. Still, the genetic configuration of Indians is as complex as the history of the Indian subcontinent, interwoven with numerous threads of unknown facts. More genetic data are necessary to unravel the structure of Indian genetic composition and to ascertain that the outcome of the present study is general rather than exclusive.

Faisal Khan
1 Department of Medical Genetics, Sanjay Gandhi Post Graduate Institute of Medical Sciences, Raebareli Road, Lucknow, Uttar Pradesh, 226014 India.
2 Faculty of Medicine, University of Calgary, Calgary, AB T2N4N1, Canada.
Received 1 August 2007
revision received 14 March 2008


We are thankful to the Indian Council of Medical Research (ICMR), New Delhi, and the Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow, [End Page 284] for providing various laboratory facilities and other assistance for conducting the present study.

Literature Cited

Agrawal, S., S. Bhatnagar, U. Bhardwaj et al. 2001. Distribution of HLA class II antigens in three North Indian populations. Int. J. Hum. Gen. 1:283–291.
Agrawal, S., and F. Khan. 2005. Reconstructing recent human phylogenies with forensic STR loci: A statistical approach. BMC Genet. 6:47.
Agrawal, S., F. Khan, A. Pandey et al. 2005. YAP, signature of an African–Middle Eastern migration into northern India. Curr Sci. 88:1977–1980.
Agrawal, S., B. Muller, U. Bharadwaj et al. 2003. Microsatellite variation at 24 STR loci in three endogamous groups of Uttar Pradesh, India. Hum. Biol. 75:97–104.
Ashma, R., and V. K. Kashyap. 2003. Genetic profile based upon 15 microsatellites of four caste groups of the eastern Indian state, Bihar. Ann. Hum. Biol. 30(5):570–578.
Balakrishnan, V. 1978. A preliminary study of genetic distances among some populations of the Indian subcontinent. J. Hum. Evol. 7:67–75.
Bamshad, M., T. Kivisild, W. S. Watkins et al. 2001. Genetic evidence on the origins of Indian caste populations. Genome Res. 11:994–1004.
Bamshad, M. J., S. Wooding, W. S. Watkins et al. 2003. Human population genetic structure and inference of group membership. Am. J. Hum. Genet. 72:578–589.
Barbujani, G., A. Magagni, E. Minch et al. 1997. An apportionment of human DNA diversity. Proc. Natl. Acad. Sci. USA 94:4516–4519.
Bhattacharyya, N. P., P. Basu, M. Das et al. 1999. Negligible male gene flow across ethnic boundaries in India, revealed by analysis of Y-chromosomal DNA polymorphisms. Genome Res. 9:711–719.
Bowcock, A. M., J. Ruiz-Linares, E. Tomfohrde et al. 1994. High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368:455–457.
Calafell, F., A. Shuster, W. C. Speed et al. 1998. Short tandem repeat polymorphism evolution in humans. Eur. J. Hum. Genet. 6:38–49.
Cavalli-Sforza, L. L. 1997. Genes, peoples, and languages. Proc. Natl. Acad. Sci. USA 94:7719–7724.
Cavalli-Sforza, L. L. 2005. The Human Genome Diversity Project: Past, present, and future. Nat. Rev. Genet. 6:333–340.
Cavalli-Sforza, L. L., and M. W. Feldman. 2003. The application of molecular genetic approaches to the study of human evolution. Nat. Genet. 33(suppl.):266–275.
Comey, C. T., B. Budowle, D. E. Adams et al. 1993. PCR amplification and typing of the HLA DQ alpha gene in forensic samples. J. Forensic Sci. 38:239–249.
Destro-Bisol, G., G. Spedini, and V. L. Pascali. 2000. Application of different genetic distance methods to microsatellite data. Hum. Genet. 106(1):130–132.
Dutta, R., B. M. Reddy, P. Chattopadhyay et al. 2002. Patterns of genetic diversity at the nine forensically approved STR loci in the Indian populations. Hum. Biol. 74:33–49.
Farah, C. E. 2003. Islam. New York: Baron’s Educational Series.
Felsenstein, J. 1993. Phylogeny Inference Package (PHYLIP), Version 3.5c. Distributed by the author, Department of Genetics, University of Washington, Seattle.
Heinz, C. B. 1999. Asian Cultural Traditions. Prospect Heights, IL: Waveland Press.
Jorde, L. B., and S. P.Wooding. 2004. Genetic variation, classification, and “race.” Nat. Genet. 36(11, suppl.):S28–S33.
Karve, I. 1961. Hindu Society: An Interpretation. Poona, India: Deshmukh Prakashan.
Karve, I. 1968. Kinship Organization in India. Bombay: Asia Publishing House. [End Page 285]
Kashyap, V. K., R. Ashma, S. Gaikwad et al. 2004. Deciphering diversity in populations of various linguistic and ethnic affiliations of different geographical regions of India: Analysis based on 15 microsatellite markers. J. Genet. 83(1):49–65.
Keay, J. 2000. India. New York: Grove Press.
Khan, F., A. K. Pandey, M. Tripathi et al. 2007. Genetic affinities between endogamous and inbreeding populations of Uttar Pradesh. BMC Genet. 8(1):12.
Khan, F., S. Talwar, S. Venkataramen et al. 2003. ApoB 3′ HVR polymorphism a genetic variation in Indian subcontinent. Int. J. Hum. Genet. 3:139–145.
Kidd, K. K., A. J. Pakstis, W. C. Speed et al. 2004. Understanding human DNA sequence variation. J. Hered. 95:406–420.
Lakshmi, N., D. A. Demarchi, P. Veerraju et al. 2002. Population structure and genetic differentiation among the substructured Vysya caste population in comparison to the other populations of Andhra Pradesh, India. Ann. Hum. Biol. 29(5):538–549.
Majumder, P. P. 1998. People of India: Biological diversity and affinities. Evol. Anthropol. 6:100–110.
Misra, V. N. 2001. Prehistoric human colonization of India. J. Biosci. 26:491–531.
Mukherjee, N., A. Nebel, A. Oppenheim et al. 2001. High-resolution analysis of Y-chromosomal polymorphisms reveals signatures of population movements from Central Asia and West Asia into India. J. Genet. 80:125–135.
Perez-Lezaun, A., F. Calafell, E. Mateu et al. 1997. Allele frequency for 20 microsatellites in a worldwide population survey. Hum. Hered. 47:189–196.
Poliakov, L. 1974. The Aryan Myth: A History of Racist and Nationalist Ideas in Europe. New York: Basic Books, 190.
Puppala, S., and M. H. Crawford. 1996. Genetic structure and phylogeny of populations of India: Andhra Pradesh. Homo 47:73–84.
Quintana-Murci, L., C. Krausz, T. Zerjal et al. 2001. Y-chromosome lineages trace diffusion of people and languages in southwestern Asia. Am. J. Hum. Genet. 68:537–542.
Rajkumar, R., and V. K. Kashyap. 2004. Genetic structure of four socioculturally diversified caste populations of southwest India and their affinity with related Indian and global groups. BMC Genet. 5:23.
Renfrew, C. 1989. The origins of Indo-European languages. Sci. Am. 261:82–90.
Rosenberg, N. A., J. K. Pritchard, J. L. Weber et al. 2002. Genetic structure of human populations. Science 298:2381–2385.
Roychoudhury, S., S. Roy, A. Basu et al. 2001. Genomic structures and population histories of linguistically distinct tribal groups of India. Hum. Genet. 109:339–350.
Sahoo, S., and V. K. Kashyap. 2005. Influence of language and ancestry on genetic structure of contiguous populations: A microsatellite based study on populations of Orissa. BMC Genet. 6(1):4.
Terreros, M. C., D. Rowold, J. R. Luis et al. 2007. North Indian Muslims: Enclaves of foreign DNA or Hindu converts? Am. J. Phys. Anthropol. 133(3):1004–1012.
Tishkoff, S. A., and K. K. Kidd. 2004. Implications of biogeography of human populations for “race” and medicine. Nat. Genet. 36(11, suppl.):S21–S27.
Weber, J. L., and C. Wong. 1993. Mutation of human short tandem repeats. Hum. Mol. Genet. 2(8): 1123–1128.
Wolpert, S. 1991. India. Berkeley: University of California Press. [End Page 286]

Additional Information

Print ISSN
Launched on MUSE
Open Access
Back To Top

This website uses cookies to ensure you get the best experience on our website. Without cookies your experience may not be seamless.