Reconstructing the evolution of Indo-European grammar
This study uses phylogenetic methods adopted from computational biology in order to reconstruct features of Proto-Indo-European morphosyntax. We estimate the probability of the presence of typological features in Proto-Indo-European on the assumption that these features change according to a stochastic process governed by evolutionary transition rates between them. We compare these probabilities to previous reconstructions of Proto-Indo-European morphosyntax, which use either the comparative-historical method or implicational typology. We find that our reconstruction yields strong support for a canonical model (synthetic, nominative-accusative, head-final) of the protolanguage and low support for any alternative model. Observing the evolutionary dynamics of features in our data set, we conclude that morphological features have slower rates of change, whereas syntactic traits change faster. Additionally, more frequent, unmarked traits in grammatical hierarchies have slower change rates when compared to less frequent, marked ones, which indicates that universal patterns of economy and frequency impact language change within the family.*
Indo-European linguistics, historical linguistics, phylogenetic linguistics, typology, syntactic reconstruction
Supplemental Material: https://muse.jhu.edu/article/810804/pdf
1.1. A century of indo-european syntactic reconstruction
More than a century has passed since the pioneering work on syntactic reconstruction by the Neogrammarians (Brugmann & Delbrück 1893, 1897, 1900, Wackernagel 1920), dealing with core issues in Indo-European grammar such as case, word order, alignment, agreement, person agreement, position of the verb, and the behavior of clitics. The system reconstructed for Indo-European in these works was fundamentally based on a comparative-historical reconstruction of morphological and syntactic features of ancient Indo-European languages, with a strong focus on a systematic comparison of Old Indo-Aryan, in particular Vedic Sanskrit, with Latin and Greek. This model—which we label 'canonical'—used Sanskrit as a template for syntactic reconstruction. In Hirt's words: 'Delbrück's point of departure is Sanskrit. If something is not present in Sanskrit, it does not belong to Indo-European' (Hirt 1934:5, our translation). A new era of syntactic reconstruction, which is reflected in Hirt's skeptical view, began with the decipherment of Hittite in 1915 and the discovery of the Anatolian branch of Indo-European. Due to the old age of Anatolian sources, some linguists considered the complex synthetic structure of Old Indo-Aryan and Greek a secondary development, holding the view that Anatolian reflects a more archaic system. Accordingly, the discovery of Anatolian gave rise to alternative theories of Proto-Indo-European grammar, involving ergative (Uhlenbeck 1901, Vaillant 1936) or isolating (Hirt 1934) structure, and resulted in the concept of Indo-Hittite (Sturtevant 1962). Any grammatical reconstruction postdating Indo-Hittite has to consider the role of Anatolian [End Page 561] systems in relation to Proto-Indo-European. Currently, 'Greco-Aryan' and 'Anatolian' models serve as complementary to each other in Indo-European grammar, for example, regarding the reconstruction of the verbal system (Clackson 2007:114–42).
Another important approach to syntactic reconstruction emerged during the 1970s, stemming from research on typological implicational universals (Greenberg 1963, 1978), which was adapted to a model for reconstructing syntax (Lehmann 1974). Even though this model met with skepticism from some comparative-historical scholars (Winter 1984), it had an important continuation in the typological diachronic approach of Nichols (1992, 1995, 1998). This approach has grown in importance in the era of computational typology (Bickel & Nichols 2007, Wichmann 2014), giving birth to several alternative models for explaining typological change (Baker 2011, Croft et al. 2011, Dryer 2011, Dunn et al. 2011, Levy & Daumé 2011, Longobardi & Roberts 2011, Plank 2011, Cathcart et al. 2018).
Since the pioneering work of Greenberg (1963), most syntactic reconstruction has been influenced by implicational typology. With the merging of typology and comparative-historical syntax, several important contributions to syntactic reconstruction have been published in recent decades, targeting the syntax of Indo-European as well as that of other families (Harris & Campbell 1995). This area of research goes under the name 'diachronic typology' (Viti 2015). An important approach continues the active-stative reconstruction model for Proto-Indo-European (Schmidt 1979, Gamkrelidze & Ivanov 1984, Bauer 2000). Other targeted domains have been the reconstruction of an activestative verbal paradigm (Jasanoff 1978), the collective/count plural in the case system (Melchert 2000), dative subject constructions (Barðdal & Eythórsson 2009), and various aspects of modality, tense, voice, aspect, particles, and gender (Meier-Brügger et al. 2010:374–412). Along with these works, there are a number of excellent overviews on principles of syntactic reconstruction (Roberts 2007, Ferraresi & Goldbach 2008, Barðdal 2014) and monographs and handbooks compiling recent progress in various areas of syntactic reconstruction (Josephson & Söhrman 2008, Kulikov & Lavidas 2015, Viti 2015, Ledgeway & Roberts 2017). In §§3–5, where we evaluate the results of our reconstruction, we discuss this literature in further detail.
1.2. Outline of the current study
Our study analyzes comparative concepts of Indo-European morphosyntax, including the linguistic categories of alignment, verbal morphology, nominal morphology, tense typology, and word order. We analyze data from 125 languages, including ancient, medieval, and modern languages from the Indo-European family (Table 1, Figure 1). Our data set is extracted from the typological subsection of the Diachronic Atlas of Comparative Linguistics (DiACL; Carling et al. 2018, Carling 2019), a collection of linguistic data from languages of Eurasia and other regions. The original data set of 108 binary features has been recoded to yield sixty-five categorical (i.e. nonbinary) features.
[End Page 562]
We selected a well-known and well-studied family with a long history of scholarship as the basis for our investigation. The aim of the study is twofold: first, we wish to assess the extent to which phylogenetic comparative methods, which can be used to estimate the probability of morphosyntactic features in Proto-Indo-European, agree with the results of previous models of syntactic reconstruction for the Indo-European family. Second, we aim to make inferences about the evolutionary dynamics and variability of different morphosyntactic features during the course of the history of the Indo-European languages. In §2, we describe the model, method, and data forming the basis for the current study. We then evaluate the results of the reconstruction for Proto-Indo-European and envisage further research that could emerge from the data and the model (§3). We give the results of a statistical study in §3.5, where we compare our reconstructed results to three different models of comparative-historical syntax. In §4, we discuss the evolutionary dynamics and variability of the transition of traits. Finally, we discuss our results, both in the light of previous reconstructions of Indo-European grammar and from the perspective of general theories of grammar evolution (§5). Technical descriptions of the methods used in this article can be found in Appendices A–F of the online supplementary material, available at http://muse.jhu.edu/resolve/127; sections S1–S9 of the online supplementary material contain full details of the data employed and results. The raw data set is available open access via the DiACL database (https://diacl.ht.lu.se/). All code, metadata, and data are available at the following links: https://github.com/chundrac/rec-evo-IE-gram, https://zenodo.org/record/4275010.
2. Theory, model, data, coding, method, and analysis
2.1. Comparative-historical, typological, and phylogenetic models of reconstruction
The model of morphosyntactic reconstruction introduced by scholars of Indo-European in the nineteenth century is based primarily on the comparative-historical [End Page 563] method, systematizing forms and meanings of morphemes in such a way that sets of paradigms, rules, and syntactic patterns can be reconstructed to a protolanguage. Even though morphemes can be reconstructed as a result of the comparative-historical method, the reconstruction of their syntactic function is nontrivial, due to the uncertainty of regularity and the problem of establishing directionality in syntactic change (Barðdal 2014). Nevertheless, this method of reconstruction is utilized by a number of scholars, even though there is agreement that it should not be applied to properties that are unconstrained by morphology, such as word order (Harris & Campbell 1995, Harris 2008). Proponents of the comparative-historical reconstruction model argue that if a specific pattern, aided by morphological reconstruction, has survived in a majority of languages, then there is reason to reconstruct it to the protolanguage (Campbell & Harris 2002:615). Critics of this model point to the directionality problem: if several daughter languages carry the same pattern, we may reconstruct the pattern to an ancestral state of those languages, but in case of a disagreement we do not know enough about the directionality of syntactic change to reconstruct one variant over another (Roberts 2007, Walkden 2013).
The model of reconstruction used by typologists from the 1960s onward is based upon a different principle: if language-internal implicational dependencies between typological features, so-called universals, can be identified, then these observations can be used as an argument for reconstructing typological properties to a protolanguage. A major obstacle to the adaptation of this model is how to deal with language-internal conflicts between features with respect to assumed dependencies, both in attested and in reconstructed languages. An example is the controversy over Indo-European word order, where reconstruction based on ancient languages does not yield a uniform result with respect to the protolanguage (Lehmann 1974, Friedrich 1975, Watkins 1976, Winter 1984).
In phylogenetic comparative methods, the issue of reconstruction is formulated in probabilistic terms, using phylogenetic computational algorithms originally adapted from biology (Calude & Verkerk 2016, Silva & Tehrani 2016, Jäger 2019). These models assume a specific stochastic process underlying character evolution, which usually involves transition rates that characterize change between values of a linguistic variable (e.g. different main clause word orders) over a phylogeny. These rates are estimated on the basis of a phylogenetic representation, often a tree sample inferred from basic vocabulary patterns or a comparable linguistic feature, and the distribution of the feature among the daughter languages. These rates can be used to reconstruct the probability of a given value at internal nodes of the tree, including the root (i.e. the node ancestral to all others in the tree), as well as infer locations on branches of the tree where change is likely to have taken place (Maurits & Griffiths 2014, Dunn et al. 2017, Widmer et al. 2017, Cathcart et al. 2018, Blasi et al. 2019, Cathcart et al. 2020).
Figure 2 provides a schematic toy diagram of an ancestral-state reconstruction problem in a phylogenetic comparative framework. Given a phylogeny with observed data at the tips of the tree, the procedure has two objectives: (i) to infer transition rates between feature values, and (ii) to estimate values for unobserved internal nodes that are most likely to have preceded the values displayed by (or inferred for) their descendants. In a parsimony framework (a model that minimizes the total number of character-state changes), this often involves restricting the number of parallel changes over the phylogeny. In a likelihood-based framework, including its Bayesian extensions, this also involves inferring evolutionary rates that express the probability of changes between different states over various spans of time represented by the branch lengths of the phylogeny. In general, evolutionary rates are inferred while treating internal states as a nuisance [End Page 564] factor, which are to be marginalized out for the sake of efficiency. The rates that are inferred, or their posterior distributions under the Bayesian approach, can then be used to estimate the probabilities of different states at different nodes in the tree, starting at the tips and moving toward the root (Felsenstein 2004, Yang 2014).
2.2. Data: original data set and recoding for the current study
We use a data set of Indo-European languages extracted from the DiACL Typology/Eurasia data set (Carling et al. 2018). The data involve categories of grammar that have been under discussion in both diachronic syntax and general typology for a long time, and are coded according to a hierarchical model designed to represent morphosyntactic features with an adequate level of granularity. Like other similar databases, such as AUTOTYP (Bickel & Nichols 2002) or the The world atlas of language structures online (WALS; Dryer & Haspelmath 2013), the data set consists of comparative concepts (Haspelmath 2010), definitions of linguistic features of grammar designed for crosslinguistic comparison. The original binary hierarchical model of DiACL, related to multivariate approaches (Bickel & Nichols 2007), organizes comparative concepts according to levels of increasing detail. We recoded the binary data so that the data set consists of categorical variables1 (e.g. main clause word order) taking multiple values (e.g. SVO, SOV, VSO, V2), organized within larger morphosyntactic categories (comprising alignment, word order, nominal morphology, verbal morphology, and tense). This results in sixty-five categorical variables, sixty-four of which are 'informative' in that they show variation within Indo-European and are thus suitable for phylogenetic analysis (Figure 3). Figure 3a illustrates the hierarchical principle of organizing linguistic properties into grids, features, variants, and values, which is used in the database DiACL. Figure 3b demonstrates how this hierarchy is mapped into categorical variables, which contain blocks of value combinations, defined as traits.
There are several advantages of using a hierarchical model for typological data, as in the original data set of our DiACL study (Carling et al. 2018) or AUTOTYP (Bickel & Nichols 2002). The most important advantage is the possibility of increasing detail, which enables local adaptations as well as the possibility of testing grammatical relations. Another advantage is the ability to recode the binarized strings of 1 and 0 into new combinations that match a specific research question. A further advantage is the possibility of contrasting features across grammatical categories (see the next section). [End Page 565]
The procedure for transforming the hierarchically organized original data into our scheme of recoded categorical variables is shown in Fig. 3, exemplified using alignment. The complete recoded data is given in §S3a in the online supplement. The various coding and recoding strategies are described in their respective sections below, where we discuss and evaluate results (§3).
2.3. Additional data sets
Coding of reconstruction models
Our study compares a probabilistic model of reconstruction with insights from comparative-historical approaches to syntactic change. For this purpose, we have selected a number of well-known approaches to the reconstruction of Indo-European syntax against which to compare our results. There is a rich literature on the reconstruction of Indo-European syntax, the full treatment of which is outside of this article's scope. For the sake of simplicity, we have limited our comparisons to representative publications that address all grammatical categories present in our data. We found three descriptions that were complete enough to enable us to treat the models proposed according to a coding scheme against which we could easily compare our results. These we label canonical (Brugmann & Delbrück 1893, 1897, 1900), isolating (Hirt 1934, 1937), and active-stative (Gamkrelidze & Ivanov 1984, 1995). It is important to remember that the alternative theories (isolating and active-stative) reconstruct a stratified Proto-Indo-European language. At the root, they reconstruct a joint Anatolian and non-Anatolian stage (Indo-Anatolian), which later transforms into a stage that represents the predecessor of the non-Anatolian languages. Substantial portions of the discussion within alternative theories deal with the process of system change from Indo-Anatolian to non-Anatolian branches (Pooth et al. 2018). In order to ensure the comparability of our results with previous theories of grammar reconstruction, we use an Indo-Anatolian reference phylogeny, which represents the consensus view on branching and time depth of the Indo-European family. We take the root of our phylogeny, which serves as the joint ancestral stage of the Anatolian and non-Anatolian subbranches of the family, to represent the unattested proto-indo-european language (see further §§3–5 and §S9). [End Page 566]
The coding of feature variants of models of the raw DiACL data, including source references, is found in §S2b; recoding of feature variants into our categorical features is found in §S3a.
Coding of grammatical hierarchies in the data
Additionally, we implement a coding of grammatical hierarchies between features in our data (§S7). The issue of grammatical hierarchies is of key importance to the implicational typology model of Greenberg (1966), Comrie (1981), and Croft (1990, 2003) and is implicitly connected to the frequency of grammatical categories as well as markedness theory (Haspelmath 2006). During the course of our analyses, we found that our model displayed asymmetric results for different features, not just between basic categories (e.g. word order, nominal morphology, verbal morphology, alignment, tense) but also within categories, between features differing with respect to categories such as tense (present, past) and word class (noun, pronoun) (§S2, §S3). For this reason, we chose to adopt an additional model of coding in which we identify pairs of features that belong to the same grammatical category but vary with respect to other grammatical categories, which can be defined as in a grammatical hierarchical relation to each other. For the sake of simplicity and comparability, we reduce our grammatical hierarchies to pairs of features, which have been observed in previous literature (where they are often referred to as 'scales'). There is a rich literature on grammatical as well as marking hierarchies in grammar, both from the perspective of individual languages and crosslinguistically (Comrie 1981, Croft 2003, Bornkessel-Schlesewsky et al. 2015, Haspelmath 2015, Mal'čukov 2015). Generally, grammatical hierarchies are based on three different criteria (Croft 1990:92, 2003:156–57).
• Structural criteria, that is, marking in grammars
• Behavioral criteria, that is, the inflectional and distributional patterns in languages
• Frequency, that is, the occurrence in text, both in individual languages and crosslinguistically
Only a handful of the grammatical hierarchies mentioned in the literature recur in our data, and there is also disagreement about the hierarchical organization of some of the categories in our data. One such example is the relation between future and present. Whereas the original hierarchy of Greenberg (1966, 2005) and Croft (1990:92–93) puts these traits in the order present < future, other scholars (Mal'čukov 2015, Witzlack-Makarevich & Seržant 2018) place these properties in the order future < present < past on the basis of existing marking patterns in some languages. The issue is complex: we are aware that many languages reverse general hierarchies in their grammatical systems (Bickel 2008; see also Tiersma 1982).
For this purpose, we use general grammatical hierarchies (Aissen 2003, Haspelmath 2015) as our point of reference, establishing pairwise hierarchical relations which we then implement for selected features in our data set (Table 2). The reason we use pairwise relations and not hierarchical scales (e.g. singular < plural < dual) is that we intend to compare grammatical hierarchies and the transition rates inferred by our model in a systematic fashion. Since our data contain features that are defined according to several categories, features may recur in hierarchical pairs. As an example, the features pronoun, present progressive: nominative-accusative and noun, present progressive: nominative-accusative are in a hierarchical relationship (pronoun < noun), whereas the features pronoun, present progressive: nominative-accusative and pronoun, simple past: nominative-accusative are also in a hierarchical relationship (present < past). A number of features in our data are not involved in any grammatical [End Page 567] hierarchy relation, for various reasons. One reason is that they lack a hierarchical grammatical relationship to any other feature in the data. We also chose to consistently mark negative values in a fashion similar to their positive counterparts; for example, no synthetic present progressive and no synthetic future are in a hierarchical relation present < future, just as synthetic present progressive and synthetic future are.
We chose a priori not to code any hierarchies for word order. Even though it is evident that head-final traits (OV, relative-noun, possessor-possessed, etc.) have lower rates of change (§S5), we prefer not to enter into a discussion about possible marking hierarchies or general frequencies in word order (Croft 1990:84–91).
2.4. Methodology: reconstruction with phylogenetic comparative methods
The methodology on which this article relies assumes that linguistic variables evolve under a continuous-time markov process (for an introduction see Liggett 2010), a stochastic model which assumes that there exist rates of change between values of categorical variables which characterize their evolution over time. Accordingly, our model infers rates of change between values of the categorical variables in our data set, using a tree sample representing genetic relationships between languages of the Indo-European family. Once these transition rates have been inferred, they can be used to estimate the probability of a value for a given variable at phylogenetic nodes where data are unobserved; these internal nodes correspond to reconstructible protolanguages, with the root of the tree corresponding to Proto-Indo-European.
Concrete details regarding the generation of the tree sample and the inference procedure can be found in the online supplementary material. Our tree sample (§S9) is generated as follows: we assume a fixed topology that agrees with received philological wisdom, and sample branch lengths from chronologically realistic intervals, yielding a tree with a root age uniformly distributed between 7000 and 6000 years bp. The model is Bayesian; we infer posterior distributions for transition rates, using Felsenstein's pruning algorithm (Felsenstein 1981, 2004) to compute the likelihood of these parameters for trees in the tree sample. We estimate the probability of a value for a given variable at the root of the phylogeny (i.e. for Proto-Indo-European) by randomly drawing evolutionary rates from their respective posterior samples, iteratively sampling a value at the root (Nielsen 2002, Huelsenbeck et al. 2003, Bollback 2006), and normalizing the counts for each sampled state to yield probabilities between 0 and 1.2 We evaluate these results in §3. [End Page 568]
In most cases, there is a clear result in which our procedure reconstructs a feature with relative certainty, inferring a high probability for a specific value of a variable and a low probability for the remaining values. This behavior can be seen in a histogram of all reconstruction probabilities (Figure 4); the distribution of these probabilities is U-shaped, in that low (0.0–0.25) and high (0.75–1.0) probability ranges are more frequent than the intermediate ones (0.25–0.75). While a small number of features are reconstructed with high uncertainty, meaning that we cannot say anything concrete about the value most likely to be present in Proto-Indo-European, the evolutionary dynamics of such features are still of interest, since these features may emerge in later phases of Indo-European history. Furthermore, the behavior of such variables helps us diagnose the overall behavior of our model, with an eye to why it reconstructs certain patterns with high certainty. We discuss our model's results in detail below.
3. Results: reconstruction
Using the methodology outlined in the previous section, we reconstructed probability distributions across values for each variable in our data set at the root of the phylogeny. These distributions represent probabilities that particular [End Page 569] features were present in Proto-Indo-European, under our model. A complete listing of all results, along with figures providing visualizations of the evolutionary history of all variables in our data set, is found in the supplementary material (§S4, §S8). In the following sections, we provide a detailed assessment of our results, organized thematically according to different domains of morphosyntax that have been discussed at length in the traditional literature on syntactic reconstruction in Indo-European, namely alignment, definiteness, gender, case, verbal morphology, verbal typology, and word order. Finally, we provide a quantitative comparison of our results against received wisdom in the form of models of reconstruction proposed by the different schools or camps of traditional Indo-European syntactic reconstruction described above, which we term the canonical, active-stative, and isolating models.
For variables pertaining to alignment (§S4, A1–30), our results support the reconstruction of nominative-accusative alignment in multiple systems (Table 3). Nominative-accusative alignment is found with nouns as first argument in the present progressive and in the simple past, with pronouns as first argument in the present progressive and in the simple past, and with verbal marking in the present progressive and in the simple past. At the same time, while nominative-accusative alignment is reconstructed with a higher probability than other alignment types are across these systems, the certainty with which it is reconstructed varies. We note that nominative-accusative is more likely in the present progressive than in the simple past, for both nouns and pronouns, and more likely with pronouns as first argument than with nouns. The second most frequent type of alignment is no marking, followed by ergative (both with low probabilities).
This discrepancy is striking. Considering language-internal distributions of the clause and argument types involved, it is clear that nominative-accusative features are reconstructed with higher certainty for grammatical categories of higher crosslinguistic frequency (present, pronoun) as opposed to the more infrequent categories (past, noun). This result is of relevance to discussions of grammatical hierarchies (Croft 2003, Haspelmath 2006) (see §4). We also notice that the ergative appears (at a low probability) only in the simple past. This result is reminiscent of results in the domain of verbal morphology, in which the simple past shows different patterns of change from the present progressive. [End Page 570]
The reconstruction of patterns of alignment has a long history of discussion in comparative-historical syntax. In the canonical model of Delbrück (Brugmann & Delbrück 1893, 1897, 1900), the nominative codes the first argument (S/A), independent of the transitivity of the predicate, and the accusative codes the second argument (O) (Meier-Brügger et al. 2010:401–4). However, due to the reconstruction of a case marking of -s for agent and -m for patient, ergative alignment was proposed for Proto-Indo-European at an early date (Uhlenbeck 1901). This theory was later continued by Vaillant (1936) and Soviet scholars of the 1970s (Klimov 1974, Gamkrelidze & Ivanov 1984), who reconstructed an active-stative system, based on the *-os/*-om distinctions in nominative/accusative and a corresponding *-os/*-om distinction between genitives of active and inactive noun classes (Gamkrelidze & Ivanov 1995:233–76). Several scholars have continued the active-stative theory (Schmidt 1979, Bauer 2000), reconstructing the relative chronology of the Indo-European paradigm, as well as reconstructing a continuation of change from an active-stative protolanguage and into subbranches, for example, Italic (Bauer 2000). The source of the active-stative theories is a fundamental marking distinction between animate and inanimate (active-stative), reconstructed from the case marking of nouns and pronouns in Proto-Indo-European (Table 4). The distinction is also reflected in suppletion in the pronominal paradigm (Table 5). The subject case has an unmarked zero-ending, against which the object is marked (Martinet 1962:44–46, Bauer 2000).
Under the active-stative theory, the active alignment is also marked in the two series of verbal endings, the *-mi (active) and *-h2e (inactive) conjugation, supported by the -mi and -h̬i paradigm in Anatolian (Gamkrelidze & Ivanov 1995:254–76). But as pointed out by other scholars, the formal contrast in Hittite between -mi and -h̬i conjugation is not reflected in any systematic difference in meaning (Jasanoff 2003:1–40), which is a prerequisite for the active-stative theory. At the same time, active-stative interpretations remain important in many theories of explanation of the Indo-European sets of endings (Jasanoff 1978).
The active-stative theory has no support under our reconstruction, pointing in the direction of nominative-accusative prevalence in Proto-Indo-European, both in the case marking on nouns and pronouns and in verbal conjugation, as well as in the present/past distinction (Table 3). The active-stative and ergative theories are not generally supported by all Indo-European scholars (Meier-Brügger et al. 2010:412). However, they remain of great interest to us, since they connect to the reconstruction of the Indo-European gender and case systems, which yields interesting results on the basis of our data, consistent with the reconstruction of nominative-accusative alignment. [End Page 571]
3.2. Nominal morphology
Case marking in the NP
Our results from the domain of nominal morphology provide information about the position of case marking within the noun phrase in Proto-Indo-European. In the data set, we code whether languages mark case on adjectives, articles, the first element of the NP, and the head noun (§S4, NM1–8). Our reconstruction provides support for the presence of case marking on head nouns (0.745) and adjectives (0.559), but not on the article, in line with the probable absence of definite articles in Proto-Indo-European. Our system does not provide support for the presence of a rule that case must be marked on the last member of an NP (0.076). Case marking on the noun is not especially controversial: as long as we reconstruct a synthetic case system of a canonical type (see below, Case), we also expect case marking to appear on the nominal head in a noun phrase. However, the relatively lower degree of probability of case marking on the adjective (0.559) is not completely in line with the canonical model, which also reconstructs full case marking, with respect to case and gender on adjectives (note the higher gender agreement value below; Brugmann & Delbrück 1893:402ff.).
On the whole, features pertaining to definiteness are reconstructed with low probabilities, indicating that the presence of definiteness in Proto-Indo-European is unlikely. This is the case for definiteness marked on the adjective, definiteness on the first element of the NP, definiteness on the last element of the NP, a definite article, and a definiteness suffix (§S4, NM9–17). This result is uncontroversial with respect to all models, since it is evident from the historical record that most Indo-European branches developed definiteness marking independently, by means of grammaticalization (Bauer 2007).
Features targeting noun class and nominal gender (§S4, NM18–27) display particularly interesting results. The probabilities of the presence of more than five noun classes (genders) and an animate gender are close to zero. However, the probability of having a masculine/feminine distinction is higher (0.684) than the probability of not having a masculine/feminine distinction (0.316). The probability of a special neuter gender is high (0.855). Furthermore, the probability for a predicative adjective to agree with its nominal head in gender is reasonably high (0.673; see Table 6).
These results for gender are noteworthy and somewhat controversial. At an early date, Delbrück (Brugmann & Delbrück 1893:132–33) was hesitant in reconstructing a Proto-Indo-European three-gender system, equivalent to the system found in archaic Indo-European languages such as Sanskrit or Classical Greek. Considering the formal distribution of endings and the gender syncretism found in later Indo-European branches, he proposes that the three-gender system of Indo-European emerged from a two-gender system, based on an animacy/inanimacy distinction. Hirt (1934:28) reconstructs an Indo-European [End Page 572] protolanguage with no gender marking at all on nouns. In later literature, there is consensus around an original two-gender model of Proto-Indo-European, where the feminine is secondary (Szemerényi 1989:164–65, Tichy 1993, Gamkrelidze & Ivanov 1995:242–44, Matasović 2004, Luraghi 2011). The issue of gender/noun class is critical to arguments for reconstructing active-stative or ergative systems for Indo-European, and the animacy vs. inanimacy distinction is interpreted as an active vs. inactive, subject vs. nonsubject distinction (Meier-Brügger et al. 2010:412). There are discussions of how a three-gender system emerged out of a two-gender system, that is, how the animacy category split up into a sexus distinction, the possible distinction concrete vs. abstract and noncollective vs. collective, and the formation of a feminine gender in *-h2, originally an abstract suffix, which was extended to the collective (Tichy 1993, Matasović 2004, Luraghi 2011). Although we reconstruct a masculine/feminine distinction with only moderately high probability, this result goes against the mainstream model in reconstructing a three-gender system for Proto-Indo-European.
There are several possible reasons for this result. By using comparative concepts, that is, features with no particular connection to individual pieces of morphological matter, the coding does not distinguish between the two-gender system of Hittite (which is assumed to be preserved from Proto-Indo-European) and the two-gender system of, for example, Dutch or Swedish (which collapsed from a previous three-gender system). Our model assumes that linguistic features evolve under a continuous-time Markov process, which estimates transition rates between values of a linguistic variable over time. The three-gender system is preserved and stable in many branches of Indo-European, as well as occasionally collapsed in some of the branches (e.g. Romance, Germanic), but not in a consistent way (masculine/feminine vs. common/neuter). Accordingly, the model estimates that it is more likely for Anatolian to have collapsed a Proto-Indo-European three-gender system than to have preserved an ancient two-gender system (see Figures 5–6, which display the most probable trajectories of historical development of these features under our model on a maximum clade credibility (MCC) summary tree, and further discussion in §5).
Results for features pertaining to case show a degree of agreement with the canonical system of reconstruction similar to that of the features discussed in the foregoing sections. In Indo-European studies, the topics of morphosyntactic reconstruction of nominal morphology, the case system and its functionality, and paths of syncretism in various Indo-European subbranches have been the subject of much debate, a full discussion of which is outside of this article's scope. Instead, we use our results as a point of departure in an attempt to assess the extent to which they dovetail with previous theories regarding nominal morphology in comparative-historical syntax.
As far as typological structure is concerned, we code languages for the presence of agglutination for number and case in nouns and pronouns (§S4, NM30–33). All agglutinating traits have low reconstruction probability: somewhat higher for nominal morphology, but close to zero for pronominal morphology (Table 7). Consequently, the model reconstructs absence of agglutination for Proto-Indo-European, which is more evident for pronouns than for nouns.
In comparative-historical syntax, the discussion of the typological structure of the Indo-European case paradigm relates to discussions of alignment, described in the previous section. Agglutination is not a key issue in the canonical model, which bases its reconstruction on the synthetic Old Indo-Aryan paradigm. A variant of the active-stative theory of, for example, Gamkrelidze and Ivanov (1995) is found in Hirt 1934, which reconstructs [End Page 573]
an uninflected stage of Proto-Indo-European. This stage is preserved in the neuter, which has no marking. In a later stage, the distinction between -s and -m marks 'grammatical cases', that is, alignment cases (see Table 4 above). The genitive also represents [End Page 574]
a variant of the -s and -m forms. All other cases, 'local cases', are secondarily formed by means of postposing elements. Even though it is not prominent in his text, it is evident that Hirt presupposed an agglutinating stage of Proto-Indo-European (between [End Page 575]
an assumed isolating and a synthetic stage), at least for the case paradigm. The proposed pathway from an agglutinating stage to a synthetic stage is reminiscent of Bopp's (1816) theory regarding the origin of the Proto-Indo-European verbal endings. [End Page 576]
The issue connects to the number and type of Proto-Indo-European cases. There is much discussion on this topic, in particular whether Indo-European had a rich case system and was secondarily syncretic, or whether the high number of cases in Old Indo-Aryan and other ancient languages represents an innovation. An argument is that most Indo-European languages show a decay and an increase in case syncretism rather than a growth in case morphology (Szemerényi 1996:158). An exception is Tocharian, which can partly be explained by its geographic position, surrounded by other agglutinating languages (Schmidt 1982). The evolution of Tocharian is also paralleled in modern Indo-Aryan languages (Carling 2012). Delbrück (Brugmann & Delbrück 1893:180–91) reconstructs a case system that is identical to that of Sanskrit, with a nominative/vocative, accusative, genitive, locative, instrumental, dative, and ablative. This is also the paradigm that scholars following the canonical model reconstruct (Meier-Brügger et al. 2010:398–410).
As for number and types of cases (Table 7), our reconstruction is in line with the canonical model, albeit with a degree of uncertainty. We reconstruct a system with fewer than seven cases in the nominal paradigm (0.584), but with intermediate probability. Beyond that, we find, in the nominal paradigm, an intermediate probability for a dative and a genitive (0.608) but also a low score of (0.184) for having neither a genitive nor a dative. The score for having local cases outside of the core (in our coding the core includes cases for A, S, O, dative, and genitive) is high (0.983); the score for an accusative/objective, that is, a case for O different from A, is also high (0.715), as is the score for a vocative case (0.885).
The pronominal paradigm shows similar tendencies, with some important exceptions: the probability of more than seven cases is close to zero (0.031), the probability of an A/O distinction higher (0.934), the probability of a dative moderate (0.429), and that of noncore local cases high (0.922). In addition, the probability of pronominal vocatives is low. As with alignment, we notice that the results of the pronominal paradigm are more distinct—that is, the difference between the preferred and the nonpreferred variant is larger. [End Page 577]
In sum, our reconstruction yields medium to high probabilities for a canonical system, with a nominative, accusative, genitive, dative, vocative, and one or several local cases, not exceeding seven total cases. The pronominal paradigm is similar, but with the exception that the inference procedure reconstructs a very low probability for a system of more than seven cases, as well as for the vocative.
3.3. Verbal morphology and tense
The category verbal morphology targets agreement or person concord, that is, the inflectional morphology of verbs with respect to their syntactic environment (Bickel & Nichols 2007:169–71). A basic matrix (cf. Baerman & Brown 2013) includes the variants full agreement (i.e. with reference to person and number), gender agreement (with respect to gender), and no agreement, which are matched against the core constituents S/A, O, and the case of the Recipient (dative). The coding captures the typological variation in syncretism between full and no agreement (see Table 8). Only A agreement has results that are of interest to us here; the probability of dative and O agreement, which is found in branches of Indo-European, is low for the protolanguage (Table 9).
Our results (Table 9) display two tendencies of interest to us: the higher probability of full A agreement (0.657) against syncretic A agreement (0.212) in the present progressive, and the higher probability of syncretic A agreement (0.207) against full A agreement (0.031) in the simple past. Again, we see a pattern in which the more frequent category (present) is reconstructed with higher certainty than the less frequent category (past).
[End Page 578]
Our data set lacks more fine-grained distinctions between the various categories (e.g. voice, aspect, modality) than are present in our reconstructed Indo-European system and is therefore not fully comparable with the set of endings reconstructed for Indo-European via the comparative method. Much of the system complexity of ancient Indo-European languages is also lost in several branches of the modern languages (Clackson 2007:114–56), a transformation over time that our data reflect only to a certain degree.
Typological marking of tense
Our data set's category tense takes as its focus the typological marking of present progressive and future, which is linguistically relevant from an areal perspective. In the Indo-European family, tense is historically integrated with the category of aspect: the original tense-aspect system of Proto-Indo-European is thought to have developed in many branches into a system that is mainly tense-based (Hewson & Bubeník 1997). Aspect features are not included in the original data of DiACL, which spans more families than Indo-European (Carling et al. 2018).
Tense (§S4, T1–14) includes two properties designed to capture the typological profile of the forms used to mark present progressive and future. For the present progressive (§S4, T1–4), the data distinguish whether a language uses a synthetic form or an analytic construction (with an auxiliary). The model reconstructs a high probability for a synthetic form (0.922), which is entirely consistent with the canonical model. There is an ongoing discussion in comparative-historical syntax as to whether some of the synthetic constructions of Indo-European, such as infinitives, might have an analytic origin (Meier-Brügger et al. 2010:320–21), but this is not relevant in connection to the present progressive, which makes our result uncontroversial.
The results pertaining to the future (§S4, T5–14) are also uncontroversial. The data distinguish whether a language uses an analytic construction (with auxiliary), a participle, a particle, a synthetic form, or an aspectual form. Our reconstruction yields a low probability for an analytic future formed by an auxiliary (0.372) but lower probabilities for all other variants. The future in Indo-European is an old issue. Delbrück (Brugmann & Delbrück 1897:242–55) doubted that the future of Sanskrit was derived from a future in Proto-Indo-European, due to its formal similarity with the subjunctive and aorist. In further discussions of the verbal system, even within the canonical (Greco-Aryan) model, there is consensus that the future in Greek and Indo-Aryan is a secondary development (Szemerényi 1989:244–47, Rix & Kümmel 2001:10–30, Meier-Brügger et al. 2010:236–42). Our results cannot contribute to this reconstruction; the probabilities are in general low.
3.4. Word order
Since Greenberg (1966), word order (constituent order, order of meaningful elements) has played a central role in linguistic typological research (Comrie 1981, Dryer 1992, Siewierska 1998). In diachronic syntax, reconstruction of word order remains a controversial issue. At their core, Greenberg's observations targeted implicational relations among word-order types, in later literature defined as order of head and dependent (Lehmann 1973), a concept that also includes typological properties beyond word order, following upon the order types (Nichols 1992, 1995, 1998). In Nichols's model, the various dependency types are seen as stable both geographically and diachronically, something that is indicated by the fact that the types have regional skewing patterns (Nichols 1995). Word-order harmony remains an issue also in computational typology, where the main source of controversy is whether word-order patterns are mainly lineage-specific or areal (Baker 2011, Bickel 2011, Croft et al. 2011, Cysouw 2011, Donohue 2011, Dunn et al. 2011). To avoid confusion and not enter into too much detail in the scientific literature on word order, we use the terms 'head-initial' and 'head-final' to refer to the issue of constituent order in sentences and phrases. [End Page 579]
In diachronic syntax, the reconstruction of word order is characterized as beset by methodological difficulties (Roberts 2007:175–98). The source of the uncertainty is the fact that word order in most cases cannot be implicitly connected to any morphosyntactically reconstructible material. Therefore, reconstruction of word order has to be based on, first, actual observation in attested languages, and second, connections to other properties in language, which may or may not be reconstructible. Consistency and harmony, as well as stability in word-order patterns, are central in the model of reconstruction proposed by Lehmann and Nichols (Lehmann 1973, 1974, Nichols 1992), and this is also one of the major sources of criticism against the reconstruction of word order (Watkins 1976, Winter 1984, Lightfoot 2002). For a discussion of these issues and defenses against some of these criticisms, see Harris & Campbell 1995, Campbell & Harris 2002. It is also clear that a diachronic shift from one type to another, for example, from head-final to head-initial, is a complex evolution in which archaic structures are retained and coexist side-by-side with more recent, changed ones (Bauer 1995).
Word order in Proto-Indo-European is the subject of a century-long debate, beginning with Delbrück (2010 :38–111) and Wackernagel's (1920) study on the position of clitics. Clitic position remains one of the few unquestioned reconstructed syntactic features of Proto-Indo-European (Clackson 2007:168). In recent decades, two competing positions on Proto-Indo-European word order have emerged, both of which require a consistency approach. Much of the critique of the word-order theories revolves around problems of establishing a default order in ancient languages, which form the basis for a protolanguage reconstruction (Winter 1984). Other researchers highlight the general problems of word-order reconstruction due to the inherent problem of reconstructing variation and change (Lightfoot 2002, Pires & Thomason 2008). The mainstream position, also given by Delbrück, assumes verb-finality (OV) and head-final order for noun phrases (Lehmann 1973, 1974, 1993, 2002, Mallory & Adams 1997:165–71, Clackson 2007, Hock 2013). The competing position, which bases its discussion on problems of Proto-Indo-European relative clauses, assumes VO and head-initial order for Indo-European (Friedrich 1975).
Our data set contains standardized coding of word order in ancient languages, and therefore the results relate to the discussion of word-order reconstruction based on evidence from ancient languages. As a rule, the coding policy aims to capture the dominant word order in a language, but in uncertain cases, the coding system allows for polymorphic coding, that is, coding a value 1 for two or several variants (Carling et al. 2018). Word order is also split into a relatively high level of granularity, for example, distinguishing different clause types (§S4, WO24–28, 42–46).
Considering the results of our reconstruction (§S4, WO1–50), we have to bear in mind that our model does not take into account implicational dependencies between variables (e.g. head-final or head-initial; Pagel & Meade 2006, Dunn et al. 2011, Murawaki 2018). The probability of values of a categorical variable is estimated independently of other variables.
Our model reconstructs SOV order with high probability in main clauses (0.905) as well as subordinate clauses (0.899). Furthermore, our model produces reconstructions of high probability for postpositions (0.849), possessor-noun order (0.585), adjective-noun order (0.870), OV order with participles (0.894), and OV order with infinitives (0.806; see Table 10). These results are compatible with the mainstream view on Proto-Indo-European as a head-final language.
Our results for clitic pronouns (§S4, WO5–19) are less simple to interpret. For clitic pronouns, we distinguish second position, OV, and VO (if the language does not have [End Page 580]
clitic pronouns, the variable is not applicable to the language), with finite verb, infinitive, and participle. Our model reconstructs distributions of high uncertainty for all categorical features pertaining to clitics. Here, it is obvious that the situation in the languages is too complex, with too many gains and losses at hand, for a clear picture to emerge. This result is problematic, considering the safe reconstruction of the position of clitics in Indo-European (Krisch 1990).
Finally, we have an interesting result: a reconstructed high probability of noun-relative clause (0.627) over relative clause-noun order (0.292; see Table 10). The position and construction type of the relative clause were a major source of conflict between Lehmann and Friedrich and have been extensively discussed in Indo-European syntax (Watkins 1976, Hock 2013). In accordance with Greenberg's (1963) consistency theory, continued by Lehmann (1973, 1974), an OV language is more likely to have relative clause-noun order (Harris & Campbell 1995:363–67), which is also the common type of Hittite and Latin. However, Sanskrit and Homeric Greek have the reverse order, noun-relative clause, which is inconsistent with OV (Clackson 2007:171–76). The high probability of noun-relative clause order can only be taken to be provisional; due to the simplified definition of relative clauses in our data (NRel/RelN, which does not distinguish, for example, correlative relative clauses, type of clause relation (i.e. paratactic or hypotactic), or restrictive/nonrestrictive types), our result does not bear fully on the issue of Proto-Indo-European relative clauses in the degree of detail with which they are treated in the comparative-historical literature (Hock 2013).
3.5. Comparison of results with traditional models of reconstruction
As described above, our coding scheme for different models of traditional Indo-European reconstruction, which we term canonical, isolating, and active-stative, is based on three sources: Brugmann-Delbrück (Brugmann & Delbrück 1893, 1897, 1900), Hirt (Hirt 1934), and Gamkrelidze-Ivanov (Gamkrelidze & Ivanov 1984, 1995). We identify the values reconstructed to Proto-Indo-European by the different models for the variables [End Page 581] in our data set to the extent that information is available, though not all variables in our data set are addressed in these sources. The different models have somewhat differing conceptualizations of the nature of the Proto-Indo-European language.
Brugmann-Delbrück do not regard Proto-Indo-European as a diachronically stratified language; rather, they reconstruct a uniform language, based on Old Indo-Aryan, Greek, Latin, and other ancient Indo-European languages. Compared to the others, their model of Proto-Indo-European is simpler: they reconstruct a highly synthetic stage, which in all branches of the family becomes simplified and less synthetic, losing a number of categories. The other models have a different take on this issue. Scholars reconstructing active-stative and isolating systems presuppose that Proto-Indo-European was a language with several diachronic layers, which changed from a hypothetical early active-stative or isolating stage to a later synthetic stage, found in all ancient languages except for Anatolian. The principles and reasons for this change are important in both the isolating and active-stative models (Gamkrelidze & Ivanov 1995:270–71, Hirt 1934:29–36), as well as in other publications where alternative models are reconstructed for Proto-Indo-European (Bauer 2000, Pooth et al. 2018). The phylogenetic model we employ does not allow us to explicitly stratify the protolanguage into layers. It allows us to reconstruct probabilities at the root as well as at ancestral nodes of the tree (see Figs. 5–7 above and §S8); we take the root of the phylogenetic tree to represent the earliest layer of the protolanguage in the sources mentioned before, but do not consider any subsequent changes or areal differentiations within Proto-Indo-European. We use a tree sample that is compatible with the Indo-Anatolian hypothesis (§S9); the root represents the earliest layer of Proto-Indo-European in all models, before Anatolian split off and subsequent changes began in the Anatolian and non-Anatolian subbranches.
The values reconstructed to Proto-Indo-European for each model are found in §S4. We assess the extent to which our results agree with the views of each comparative-historical reconstruction model on the basis of the likelihood of each model's reconstructed values for each variable in our data set (where applicable), that is, the probability with which our model reconstructs the value to Proto-Indo-European for the variable in question. These likelihoods are found in Figure 8; higher values indicate greater agreement.
Whereas the active-stative and isolating models differ from our results for some of the domains discussed above, such as nominal and verbal morphology (isolating) or alignment (active-stative), both of these models come close to our reconstruction for other domains such as word order, and in future tense and definiteness being absent in the reconstructions. At the same time, it is clear that our results show the most agreement with the canonical model of reconstruction (median likelihood = 0.796), followed by the active-stative (median likelihood = 0.657) and isolating (median likelihood = 0.652) models; agreement with the canonical model is significantly higher than with both other models according to a pairwise Wilcoxon signed-rank test for paired samples (p < 0.01 with Benjamini-Hochberg correction for multiple comparisons; variables for which a reliable value was not found for all models are excluded). This indicates that our results most clearly resemble the canonical model of Proto-Indo-European, close to the reconstruction outlined by Brugmann and Delbrück in the nineteenth century.
4. Results: evolutionary dynamics
4.1. Reconstructed probabilities, feature distributions, and transition rates
Our reconstructions are estimated from transition rates inferred on the basis of our tree sample and the features in our data set; these rates characterize the behavior of pairwise transitions between all values of each variable in our data set. Specifically, a [End Page 582]
transition rate represents the average number of times that a change from a value x (e.g. SOV main clause word order) to a value y (e.g. SVO main clause word order) occurs within a 1000-year span. In this section, we assess the extent to which the frequencies of individual features, as well as gain and loss rates pertaining to the transition rates of individual features—or as a proxy, their diachronic stability or instability—influence the reconstructions produced by our phylogenetic model. It may be the case that only highly stable and frequent features have a chance of being reconstructed to the protolanguage with high probability, and less frequent features or features in greater flux are unlikely to be reconstructed. If our model essentially carries out a majority rules–style method of reconstruction, then its utility is severely diminished, as a phylogenetic model is not needed to reconstruct the most frequent pattern. If, however, it picks up on more nuanced patterns of change and incorporates these dynamics into the reconstructions it produces, then the value of methods of this sort is evident. Additionally, an analysis of rates of change alongside reconstruction probabilities provides a better understanding of temporal dynamics within Indo-European.
Does our model simply reconstruct the most frequent feature?
If our model simply reconstructs via a majority-rules approach, then there is no real reason to use a phylogenetic model, since our method attends only to the frequency distributions of features across languages and not to patterns of genetic relatedness between said languages. Furthermore, if this is the case, then certain Anatolian features, if rare within Indo-European, face a natural disadvantage and will not be reconstructed, which leads to a result in line with the canonical model of Indo-European reconstruction.
In order to assess the degree of sensitivity of our reconstructions to the frequency distributions of features in our data set, we computed the relative frequency in our data set [End Page 583] for features reconstructed with highest probability by our model. We carried out this procedure for all languages, as well as only the ancestral languages in our sample. Figure 9 shows the probabilities with which 'winning' features (black dots) and 'nonwinning' features (gray dots) are reconstructed, plotted against their relative frequencies in our data set. Visually, it is clear that the relationship between these quantities is noisy; at best, the correlation is moderate (all languages: Spearman's ρ = 0.58, p < 0.001; ancestral languages: ρ = 0.4, p < 0.001; statistical tests include only 'winning features' reconstructed with highest probability in order to ensure that data points are independent). This indicates that our system's reconstructions are somewhat (but not overwhelmingly) sensitive to the distribution of features within our data set and within more archaic ancestral languages. Certain highly frequent features are not necessarily reconstructed, if our system infers that the feature is likely to have come about many times in parallel. Additionally, an infrequent feature may be reconstructed if is more likely to have survived into certain languages than to have come about in parallel. This can be illustrated using our system's reconstruction of adpositions: prepositions are frequent in our data set, but infrequent among older languages, and are accordingly reconstructed with low probability.
A few concrete examples serve to exemplify the consequences of this behavior, with respect to the reconstruction of features found in Anatolian to Proto-Indo-European. Figure 5 above shows that a masculine/feminine gender distinction is reconstructed by our model to Proto-Indo-European, despite the fact that it is absent in the archaic Anatolian subgroup; because the gender distinction is predominant in Nuclear Indo-European (i.e. all non-Anatolian languages) and because it has been lost several times, our model assigns high probability to a scenario where Anatolian lost the gender distinction during its development (our model also assigns a small amount of probability to a scenario in which the gender distinction was lost independently within Anatolian). In Fig. 6, we see that neuter gender is reconstructed to Proto-Indo-European because it is shared by a substantial number of Core Indo-European (i.e. all of Indo-European but [End Page 584] Anatolian and Tocharian) languages and by Anatolian, to the exclusion of Tocharian. Figure 7 shows that our system does not reconstruct prepositions to Proto-Indo-European with high probability; incidentally, although prepositions are reconstructed with high probability for Core Indo-European, our model finds it unlikely that prepositions were lost independently in the history of Anatolian as well as Tocharian (but finds it likely that they were lost in some Indo-Iranian lineages). Hence, we see that Anatolian features have a good chance of being reconstructed to Proto-Indo-European if they are reconstructed to the protolanguage of at least one other higher-order or archaic Indo-European branch.
Given the transitions between each pair of values (e.g. ergative → accusative) within a variable, we estimate the overall entry or gain rate and the overall exit or loss rate for individual features according to the formulae given in the online supplementary material, Appendix F. The full list of interfeature transition rates as well as the gain and loss rates for each feature are found in §§S5–6. A scatterplot of gain and loss rates for each feature, organized according to overarching grammatical categories, is found in Figure 10. The size of individual data points indicates the probability with which a given feature is reconstructed to Proto-Indo-European by our model, with larger size indicating higher probability. The plot is divided according to the median gain and loss rates for our variables; this allows us to divide features into the following four classes of features.
(i) High gain rate, high loss rate (upper right quadrant): features of high instability, in frequent flux, gained and lost frequently. These include features pertaining to the presence of case on adjectives, clitics, distinctions between dative and genitive marking, absence of case on nouns, and different alignment systems in the simple past.
(ii) High gain rate, low loss rate (lower right quadrant): features of high stability to which languages are frequently attracted; gained often and rarely lost. These include features pertaining to the presence of case on nouns, case difference between A and O for nouns as well as pronouns, masculine/feminine distinction, noun-relative word order, possessor-noun word order, present progressive by auxiliary, and absence of neuter gender and vocative case.
(iii) Low gain rate, high loss rate (upper left quadrant): 'recessive' features (cf. Nichols 1993) quickly repulsed by languages when they do occur. These include features pertaining to the presence of future tense by participle, future tense by particle, more than seven cases, more than seven pronominal cases, more than five genders, tripartite alignment, ergative alignment in pronouns, active-stative alignment, double oblique alignment, V2 word order, and VSO word order.
(iv) Low gain rate, low loss rate (lower left quadrant): highly stable features that arise infrequently. These include features pertaining to the presence of adjective-noun word order, agglutination for case, agreement on prepositions, case on the last member of an NP, definite articles, definite suffixes on adjectives, definite suffixes on nouns, neuter gender, a noun class for animates, and a synthetic future tense.
For two of the classes, patterns of reconstruction are highly consistent. Stable features that are frequently gained and infrequently lost (lower right quadrant) are virtually always [End Page 585]
reconstructed with high certainty. Recessive features that are rarely gained and frequently lost (upper left quadrant) are consistently reconstructed with low probability. For the remaining classes, the behavior of our model is more variable. Infrequently gained but stable features (lower left quadrant) are reconstructed with both high and low probability. For instance, SOV word order (in both main and subordinate clauses), adjectivenoun word order, neuter gender, postpositions, and vocative case are reconstructed with high probability, whereas features pertaining to definiteness and agglutination, as well as SVO word order (in both main and subordinate clauses), are reconstructed with low probability. The same variability is found for features that fluctuate (upper right quadrant): features pertaining to nominative-accusative alignment, the genitive/dative distinction, and case on adjectives are reconstructed with a probability greater than 0.5, whereas features pertaining to case mergers, clitic word order, and the presence of ergativity in the simple past tense are reconstructed with low probability.
It is noteworthy that features pertaining to nominal morphology and tense (inflectional typology) tend to exhibit slow change and features pertaining to alignment and verbal morphology (agreement) show more rapid patterns of change. Word order is well represented among swift-changing features. These results partly confirm that traits that are immediately bound by morphology, such as nominal morphology and tense, have slower rates of change, in contrast to traits that are not bound by morphology, such as word order and alignment, which have swifter rates of change. Verbal morphology, that is, agreement patterns, and the most stable and frequent word orders constitute an exception to these tendencies.
All in all, these results show that the reconstructions produced by our model are not simply artifacts of feature distributions across our data set; on the contrary, they reflect multifaceted patterns of change. For certain types of stability or instability, features are reconstructed [End Page 586] with either high or low probability, while for other patterns, there is more variability among reconstruction probabilities. These patterns are summarized in Table 11.
4.2. Transition rates and grammatical hierarchies
Earlier, we mentioned the existence of asymmetries in the certainty with which features are reconstructed to Proto-Indo-European that correspond to differences in grammatical hierarchies. This asymmetry can also be found in transition rates pertaining to the features in question. As described in §2.3, we organize our features into hierarchical pairs that belong to the same grammatical category but that vary with respect to other features of the grammar. The categories we identify in our data are restricted to the following features (unmarked/more frequent < marked/less frequent; see Table 2).
• pronoun < noun
• present < past
• agent < object
• agent/object < oblique
• masculine/feminine < neuter
Since loss rates are an inverse measure of the longevity of a given feature (i.e. shorterlived features are lost at a higher rate), we can measure whether the loss rates differ significantly across the unmarked/marked feature pairs that we identify in our data set. We find that the loss rates of marked traits are higher than those of unmarked traits (V = 851, p < 0.001, according to a one-sided Wilcoxon signed-rank test), indicating that marked traits are lost more frequently than unmarked, more frequent traits (Figure 11).
This is an interesting result, but it is not entirely unexpected. The idea of grammatical or marking hierarchies in the traditional sense (Greenberg 1966, Comrie 1981, Croft 1990) is based on the notion that higher-ranking categories as a rule are more frequent in languages. The idea that more frequent and basic categories, both in grammar and lexicon, are more conservative and archaic, due to their everyday use, has a long history in Indo-European linguistics (Meillet 1948:135). Many lexemes that are typically part of Swadesh lists, such as kinship words, body parts, numerals, fire, water, and so forth, as a rule preserve more archaic paradigms, including change in stem consonants (e.g. -r-/-n-, -l-/-n-) or ablauting patterns (qualitative, quantitative) (Meier-Brügger et al. 2010:336–48). The reflexes in daughter languages of the most frequent verbs, such as PIE *h1es- 'to be' or PIE *h1ey- 'to go', are typically irregular, preserving archaic inflection patterns and categories (Rix & Kümmel 2001:232–33, 241–42). By contrast, analogy and other types of changes that harmonize and simplify language structures, making them easier to memorize, are more frequently found among words and categories of lower frequency. By means of phylogenetic methods, we know that there is, at least in the lexicon of basic vocabulary, a correlation between frequency and substitution rates: the most frequent meanings are concepts with generally lower substitution rates (Pagel et al. 2007). Transferred to a scenario of grammatical hierarchies, we expect the unmarked categories, representing the more frequently used categories, to have [End Page 587]
lower loss rates and longer periods between transitions, whereas we expect the marked categories, representing less frequent categories, to have higher loss rates and shorter periods between transitions.
In the preceding sections, we presented the results of Proto-Indo-European reconstruction using phylogenetic comparative methods and provided a careful analysis of our model's behavior in order to better understand the mechanisms underlying the results it produces. We found broad support for the canonical model of Indo-European syntactic reconstruction, largely because the features reconstructed under alternative models undergo evolutionary dynamics that make them unlikely to survive into the languages that attest them; they are more likely to have been innovated in parallel. The behavior of the model we use rests on the assumption that we can make inferences about the behavior of linguistic features in prehistory on the basis of their behavior during attested history; calibrating these dynamics according to attested patterns of change is made possible by the use of ancestry constraints. The methodology that we use relies on a number of simplifying assumptions about the nature of change. One of these is the assumption of rate uniformity, namely, that rates of change between values of a linguistic variable are the same on all lineages of Indo-European. There are a number of methods that relax this assumption in order to incorporate rate variation or heterotachy (Tuffley & Steel 1998, Heath et al. 2012, Beaulieu & O'Meara 2014), but the utility of these methods may be limited relative to their increased computational complexity; phylogenetic linguistic analyses assuming rate homogeneity dovetail well with independent evidence (Blasi et al. 2019), and incorporating heterotachy produces no or little improvement (Chang et al. 2015, Blasi et al. 2020).
Our data consist of typological variables rooted in comparative concepts that can be easily operationalized. Other approaches to syntactic reconstruction, rooted in particular [End Page 588] syntactic theories, have different assumptions about the levels of representation that should be reconstructed to the protolanguage (Hale & Kissock 2015); at the same time, similar work makes use of bioinformatic algorithms to address questions of linguistic prehistory (Longobardi et al. 2013). Furthermore, certain syntactic theories make strong predictions about syntactic change over many generations of first language acquisition (Berwick & Niyogi 1996, Yang 2000), which, if correct, can potentially find support in phylogenetic models. While some of the assumptions we make may be challenged by other scholars, in terms of both methodology and the nature of the data we employ, we believe that our contribution is of great utility to Indo-European studies, as the assumptions of our probabilistic model are explicit, the results we present are replicable, and we provide a means of explicit evaluation against the hypotheses explored in this article.
Our results shed light on interesting crosslinguistic diachronic tendencies. At the same time, our data are confined to one family, Indo-European, and the problem of embracing uniformitarianism within one family leads to the problem of sample diversity pointed out by Levy and Daumé (2011). When investigating differences in diachronic dynamics across hierarchies, we build upon insights from a crosslinguistic sample while assessing results derived from one family only, which is not possible without adapting a uniformity-of-state framework to language evolution: rules that govern language structure are similar in the present and in the past, and all languages reflect some basic universal principles (Croft 2003:233, Roberts 2007:174, Walkden 2019). We are forthcoming in the admission that restricting data to one family gives a limited picture. Some of the observed patterns are obviously uniquely Indo-European, such as the loss of synthetic structure. At the same time, the markedness rubric we employ is derived from crosslinguistic typological observations beyond the scope of Indo-European, allowing us to avoid circular reasoning. On the basis of trends within Indo-European, we confirm broader hypotheses regarding crosslinguistically universal tendencies, namely, that factors such as economy and frequency interfere in the processes of language evolution (Croft 2003, Haspelmath 2015). Furthermore, our finding that features pertaining to unmarked, more frequent categories are lost less frequently than those of marked, less frequent categories dovetails nicely with the common finding that frequently used items are resistant to various types of change (e.g. Diessel 2007).
It is remarkable how closely our comparative phylogenetic reconstruction approaches a canonical reconstruction model of Indo-European syntax (Brugmann & Delbrück 1893, 1897, 1900, Krahe et al. 1972). Despite all of the variation and change in the grammar in the Indo-European family, our model reconstructs a highly synthetic, mainly head-final language, with nominative-accusative alignment, independent of tense and animacy degree of the first argument, case marking on nouns, no definite article, three genders (masculine, feminine, neuter), predicative gender agreement, a nonagglutinating case system with fewer than seven cases but with a nominative, accusative, dative, genitive, and vocative, also in pronouns, a synthetic present, no future, full agreement in the present tense of verbs but not in the past tense, postpositions, OV infinitive word order, SOV in main and subordinate clauses, possessor-noun, adjective-noun, noun-relative clause, OV participle word order, and wh-initial word order. The outcome is striking. We see the structure of a grammatical system that has been retained to a high degree through many branches of Indo-European and that is remarkably constant, despite several millennia of language contact and change, loss of categories, emergence of new categories by grammaticalization, and substantial typological changes, for instance in word-order patterns. [End Page 589]
Alternative models (see §1.1, §2.3) assume far-reaching typological changes between Proto-Indo-European and the non-Anatolian subbranches of the tree. These changes are supported by internal reconstruction based on Proto-Indo-European paradigmatic correlations in combination with a comparison with other unrelated language families (see §3). Given a nuanced understanding of Indo-European chronology (Meid 1975, Schlerath 1981, Bouckaert et al. 2012, Chang et al. 2015), as well as both attested and estimated information regarding the timespans characterizing change between typological features of the type that we investigate here (Hock & Joseph 1996:183–84, Croft 2003:252, Haspelmath 2018), it is increasingly clear why there is limited support for the alternative theories. On the basis of what can be inferred about change between alignment systems of languages in our sample, for instance, a relatively rapid development from ergative or active-stative alignment (as assumed by the active-stative model) or via grammaticalization from an isolating system with as yet undeveloped agreement marking (as assumed by the isolating model) to a nominative-accusative system is less likely than the retention of nominative-accusative alignment. All of these models take into account a large amount of data (morphological, typological) that may be connected directly or indirectly to the typological traits investigated in our study.
In the realm of alignment, nominative-accusative alignment is dominant in many contemporary and most historical states of the family (Carling 2019:31–50) and is also reconstructed to the protolanguage (Table 3). Absence of agreement marking (features involving no marking), which implies isolating structure, is reconstructed to the protolanguage with low probability. In conclusion, nominative-accusative alignment is stable, and the only noteworthy trend over time is a higher rate of transition from nominative-accusative to no marking (synthetic > isolating). Developments of other systems (ergative, active-stative, tripartite) from nominative-accusative are marginal and have low transition rates. Similar patterns can be seen across other data types: in general, within Indo-European, there is greater evidence for developments in the directions synthetic > isolating and synthetic > agglutinating. Developments in the opposite direction are not impossible, but are unlikely to have taken place between Proto-Indo-European and its descendants. An exception is word order, where features exhibit varying degrees of stability and instability.
Our results are of key relevance to larger discussions about typological stability, as well as the suitability of typological data for language classification (Dunn et al. 2011, Plank 2011, Dediu & Levinson 2012, Dediu & Cysouw 2013). We observe two overarching feature classes characterized by different patterns of change. In the first, change is slow and overwhelmingly unidirectional, moving from synthetic to isolating, which is found mainly in the paradigmatic categories, that is, nominal morphology and parts of verbal morphology, where a synthetic system is broken down in the direction of an isolating system. This type of change conforms to a model of unidirectional, cyclic typological change, occurring at a slow change rate (Croft 2003:227ff.). Alternatively, the evolutionary trend is oscillating, with high amounts of gains and losses. This occurs in the syntactic (nonparadigmatic) categories, mainly in word order, alignment, and partly in verbal morphology traits. This type generally conforms to a theory of punctuated and nondirectional change, which may take any direction depending on a combination of internal pressure and areality-induced change (Dixon 1997). Given this result, the search for a phylogenetic signal in the evolution of nonhomologous, structural linguistic features is difficult without considering areality and ancient language data.
Finally, we employed a relatively uncontroversial and neutral model of change that has been used in state-of-the-art work in phylogenetic linguistics. Simple models like [End Page 590] the continuous-time Markov process assume that a feature can be born and die (or that change between traits of a multistate character can occur) with a given rate, and there are no a priori restrictions on the values taken by these rates, as long as they are positive. These models are appropriate for grammatical and lexical data, and are suitable in situations when the system may leave and return to states, as in the case of variants of word order, agreement, case, or different lexical meanings. More complex models like the stochastic Dollo character (SDC; Nicholls & Gray 2006), which assume that features are born only once in the history of a language family, are useful for features that cannot return to identical states, such as morphological traits bound to specific forms, irreversible outcomes of sound change (e.g. mergers), or features that may undergo grammaticalization. Using an SDC model would likely yield different results for our data, at least for gender. At the same time, many scholars agree that SDC models are unsuitable for not only comparative concepts, which aim to capture features crosslinguistically, independent of linguistic matter, but also other types of linguistic characters (Chang et al. 2015), though modifications thought to be more appropriate for linguistic data have been proposed (Bouckaert & Robbeets 2017). However, we are confident that a Bayesian approach has the capacity beyond a comparative-historical model to contribute in a meaningful way to the theoretical discussions about trends in diachrony, directionality of syntax, rates of gains and losses, and stability of features and categories, as well as correlations to important aspects of typology such as frequency, economy, hierarchies, and general trends in grammar change. Future work can potentially contrast the results of different evolutionary models in applications like the one undertaken in this article; researchers wishing to argue for a specific evolutionary model over others (along with its concomitant result) may employ posterior predictive checks (Box 1980; see also the online supplementary material, Appendix D) to demonstrate that their model is a better fit to the data than others.
The current article had several foci. We reconstructed the evolutionary history of selected aspects of Indo-European morphosyntax by means of a model that infers patterns of diachronic development of linguistic features over a phylogeny. This allowed us to infer the most probable value of a given linguistic variable in the unattested Proto-Indo-European language. We used a data set of binary coded comparative concepts, recoded as categorical features, which also contained data from extinct and historical Indo-European languages. We focused on five categories of grammar: alignment, nominal morphology, verbal morphology, tense, and word order. We compared the result at the protolanguage state to previous reconstructions of Proto-Indo-European grammar that were based on the comparative-historical method and diachronic typology. The methodology we used allowed us to compare ideas from the traditional comparative-historical linguistic literature with our model's output. We found that phylogenetic reconstruction produced a consistent and coherent system, which corresponds to a highly synthetic, mainly head-final language, with nominative-accusative alignment, independent of tense and animacy degree of the first argument, case marking on nouns, no definite article, three genders (masculine, feminine, neuter), predicative gender agreement, a nonagglutinating case system with fewer than seven cases but with a nominative, accusative, dative, genitive, and vocative, also in pronouns, a synthetic present, no future, full agreement in present tense of verbs but not in the past tense, postpositions, OV infinitive word order, SOV in main and subordinate clauses, possessor-noun, adjective-noun, noun-relative clause, OV participle word order, and wh-initial word order. This reconstruction matches a canonical model of Proto-Indo-European grammar, as described by the Neogrammarians in the nineteenth century. [End Page 591]
We also analyzed the inferred interfeature transition rates on which our reconstructions are based. Our analysis sheds light on different tendencies of change across features. In general, traits that were reconstructed to the protolanguage had relatively low loss and gain rates, which implies that the reconstructed typological system is consistent and stable in the family. The most noteworthy tendency is a change from synthetic to isolating structure. In addition, a general tendency is for morphological (paradigmatic) categories (nominal morphology and tense) to have low change rates and for syntactic categories (alignment and word order) to have higher change rates. Verbal morphology opposes this tendency with high change rates. Finally, we divided our grammatical traits (excluding word order) into hierarchical pairs by different members of categories available in our data, such as tense (present, past), word class (noun, pronoun), and gender. We found that the unmarked, more frequent traits are lost less frequently than marked, less frequent traits; this difference was significant.
In sum, our results support the theory that grammar evolution both is divergent, down to the level of highest granularity, and follows general universal principles. Over the 6000–7000-year cycle represented in our data, morphological traits tend to show a unidirectional path of change, fundamentally moving from synthetic to isolating, whereas word order, alignment, and person agreement properties show more nondirectional and unpredictable paths of change, with higher rates of gains and losses. The results are in line with both a cyclic and a punctuated model of change. Our results also indicate that the variability of change in grammar over time is governed by general 'universal' tendencies, such as grammatical hierarchies and frequency.
revision invited 25 April 2019;
revision received 22 May 2019;
revision invited 7 September 2019;
revision received 21 January 2020;
revision invited 15 July 2020;
revision received 16 November 2020;
accepted pending revisions 6 January 2021;
revision received 6 February 2021;
accepted 16 February 2021]
* Equal author contribution. The work was supported by the Marcus and Amalia Wallenberg Foundation, grants MAW 2012.0095 and MAW 2017.0050, both awarded to Gerd Carling. We thank audiences at the 50th annual meeting of the Societas Linguistica Europaea, the 24th International Conference on Historical Linguistics, and the linguistics seminars at Lund, Zurich, and Göttingen Universities for valuable remarks, along with three anonymous referees, Simon Greenhill, and the Language editors. We also thank Filip Larsson, Niklas Erben Johansson, Erich Round, Sandra Cronhamn, and Arthur Holmer for helpful comments on data, study design, and results, and Johan Frid for preparing trial versions of some of the graphs. Special thanks are due to Gerhard Jäger for help on various technical matters as well as for providing the proof included in the online supplementary material.
1. There are a number of terms for data of this type; in biology, it is common to refer to such data as a multistate character (e.g. eye color), which can be realized as one of several traits (e.g. green eyes). We use the terms 'variable' and 'value' for the purpose of terminological and conceptual comparability with other work in quantitative linguistics (rather than biology), at times using the terms feature and trait interchangeably with 'value'.
2. Phylogenetic rate inference and reconstruction require practitioners to define the prior probability of different values of a variable at the root of the phylogeny. A common practice in biology is to use the stationary probability of the continuous-time Markov process, which gives the probability of the system taking a particular value as time approaches infinity. Felsenstein (2004:252) states that this prior is appropriate, but only if we assume that the model of evolution has been operating for a very long time. An alternative approach is to assume the equal prior probability of each value at the root, or to treat the root prior as an unknown parameter to be inferred. The issue of how to treat the root prior is not widely discussed in phylogenetic linguistics (many studies do not mention the issue at all), with some exceptions (Maurits & Griffiths 2014). In our main analyses, we follow other work (Cathcart et al. 2018, Blasi et al. 2019, Cathcart et al. 2020) in employing the stationary probability as the root prior. At the same time, because use of the stationary probability may bias our reconstructions, we run our models under two additional inference regimes, one using a uniform (i.e. equiprobable) root prior, and one where the root prior is treated as a parameter to be inferred. We find that results critical to the evaluation of our model against different traditional models are not affected by this choice (a full analysis of this issue is found in the online supplementary material, Appendix E).