
Suffix Ordering and Morphological Processing
There is a longstanding debate about the principles constraining the combinatorial properties of suffixes. Hay 2002 and Hay & Plag 2004 proposed a model in which suffixes can be ordered along a hierarchy of processing complexity. We show that this model generalizes to a larger set of suffixes, and we provide independent evidence supporting the claim that a higher rank in the ordering correlates with increased productivity. Behavioral data from lexical decision and word naming show, however, that this model has been onesided in its exclusive focus on the importance of constituentdriven processing, and that it requires supplementation by a second and equally important focus on the role of memory. Finally, using concepts from graph theory, we show that the space of existing suffix combinations can be conceptualized as a directed graph, which, with surprisingly few exceptions, is acyclic. This acyclicity is hypothesized to be functional for lexical processing.
suffix ordering, morphological processing, productivity, lexical strata, graph theory, directed acyclic graph
1. Introduction.
In English and many other languages with derivational morphology, there are severe restrictions on possible combinations of affixes and bases. A given derivational affix attaches only to bases that have certain phonological, morphological, semantic, or syntactic properties. For example, the verbal suffix ize occurs only on nouns and adjectives that end in an unstressed syllable; see Plag 1999 for details. Similar, or even more complex, restrictions seem to hold for affixaffix combinations. For instance, the word atomic can take the suffix ity as a nominalizing suffix, whereas the word atomless can not take ity, but can take the competing nominalizing suffix ness (*atomlessity vs. atomlessness).
There has been a long debate about whether there are general principles or mechanisms that constrain the combinatorial properties of affixes. In this debate three approaches can be distinguished. First, there are stratumoriented models (e.g. Siegel 1974, Allen 1978, Kiparsky 1982, Selkirk 1982, Mohanan 1986, Giegerich 1999) that claim that the lexicon has a layered structure and that this structure largely determines the combinatorial properties of affixes. Second, there are scholars who argue that affixparticular selectional restrictions (of a phonological, morphological, semantic, or syntactic nature) are responsible for possible and impossible combinations of affixes (e.g. Fabb 1988, Plag 1999). Most recently, a third theory has been proposed in Hay 2003, which holds that constraints on the processing of morphological structure control affix combinations.
In Hay & Plag 2004, the three models were empirically tested against the 210 potential twosuffix combinations of fifteen English derivational suffixes (most of them Germanic 'stratum 2' suffixes). It was observed that both selectional restrictions and processing constraints are instrumental in determining possible and impossible suffix combinations. Only wellprocessable combinations are possible combinations, and this range of possible combinations is further curtailed by selectional restrictions. In essence, the model proposed in Hay & Plag 2004 proved not only empirically superior to stratumoriented [End Page 109] models, but also provides a psycholinguistically attractive solution to the longstanding problems of suffixsuffix combinations.
The present article extends Hay and Plag's research program in two important ways. After a review of relevant literature, we investigate a specific suffix combination in Hay and Plag's set of combinations for which their model apparently makes the wrong predictions. We then submit their model to even more serious empirical challenges by doubling the set of English suffixes under investigation, which increases the number of combinations tested from 210 (in Hay & Plag 2004) to 930. Second, if Hay and Plag's theory of COMPLEXITYBASED ORDERING is indeed psycholinguistically plausible, affix ordering should be predictive for processing complexity. Thus, we investigate whether the ranking of suffixes obtained with our enlarged data set correlates not only with distributional measures such as productivity, but also with experimental estimates of actual lexical processing costs as gauged by latencies in wordnaming and visual lexicaldecision tasks. Finally, we summarize our findings and discuss their theoretical implications.
2. ComplexityBased Ordering: Selectional and Processing Restrictions.
Until a few years ago the debate on stacking restrictions was characterized by two opposing views. Proponents of stratumoriented models (such as Siegel 1974, Allen 1978, Selkirk 1982, Kiparsky 1982, Mohanan 1986) assume that most, if not all, combinatorial restrictions among English suffixes can be explained by the fact that these suffixes belong to different lexical strata and that these strata interact phonologically and morphologically in intricate ways. This is known as level ordering, which in turn is part of most models of lexical phonology. According to the levelordering hypothesis, English suffixes and prefixes belong to the classes or strata in 1 (from Spencer 1991:79).
(1) Class 1 suffixes: +ion, +ity, +y, +al, +ic, +ate, +ous, +ive, +able, +ize
Class 1 prefixes: re+, con+, de+, sub+, pre+, in+, en+, be+
Class 2 suffixes: #ness, #less, #hood, #ful, #ly, #y, #like, #ist, #able, #ize
Class 2 prefixes: re#, sub#, un#, non#, de#, semi#, anti#
The suffixes belonging to one stratum are said to share a number of properties that distinguish them from the suffixes of the other stratum. Stratum 1 suffixes tend to be of foreign origin ('Latinate'), while stratum 2 suffixes are mostly Germanic. Stratum 1 suffixes frequently attach to bound roots and tend to be phonologically and semantically less transparent than stratum 2 suffixes. Stratum 1 suffixes cause stress shifts, resyllabification, and other morphophonological alternations; stratum 2 suffixes do not. Stratum 1 suffixes are less productive than stratum 2 suffixes, and, crucially, stratum 1 suffixes do not occur outside stratum 2 suffixes. Thus, suffixes can only combine in such a way that they attach to suffixes of the same stratum or of a lower stratum. This is perhaps the most important generalization about suffix combinations that emerges from stratum models, since combinations in which a stratum 2 suffix occurs inside a stratum 1 suffix (such as *atom#less + ity) are ruled out on principled grounds.
However, there are serious problems with this approach. One major theoretical weakness of level ordering is that the two strata are not justified on independent grounds. In other words, it is unclear what is behind the distinction between the two strata, and which properties make a suffix end up on a given stratum. The idea that the underlying distinction is one of etymology (borrowed vs. native, e.g. Saciuk 1969) does not explain why speakers can and do master English morphology with little or no etymological knowledge. Others have argued that the stratum distinction is phonological in nature, with differences between different etymological strata being paralleled by phonological differences (see e.g. Booij 2002, van Heuven et al. 1994 for Dutch). Although this [End Page 110] would allow speakers to distinguish between the strata on the basis of the segmental and prosodic behavior of derivatives, a phonological explanation of the nature of the strata would weaken the idea of strata considerably. As Raffelsiefen (1999) shows, not even two of the many suffixes of English trigger exactly the same type of morphophonological alternations, so we would need as many substrata as we have suffixes that trigger morphonological alternations. Thus we end up with a continuum, rather than with a discrete twolevel system.
Another serious problem is that a stratum cannot be defined by the set of suffixes it contains either, because many suffixes belong to more than one stratum, given that in certain derivatives they show stratum 1 behavior, whereas in other derivatives they display stratum 2 behavior, with sometimes even doublets occurring (e.g. compárable vs. cómparable). Furthermore, there are a number of unexpected suffix combinations. Thus stressneutral ist appears inside stressshifting ic, or stressneutral ize appears inside stressshifting (at)ion. In order for the model not to make wrong predictions, dual membership of affixes (or some other device weakening the overall model) becomes a necessity.
Giegerich (1999) discusses cases of apparent dual membership of affixes in great detail and, as a consequence, proposes a thoroughly revised stratal model, in which the strata are no longer defined by the affixes of that stratum, but by the bases. This basedriven stratification model, which is enriched by many suffixparticular basedriven restrictions, can overcome some inadequacies of earlier stratal models, but at the cost of significantly reducing the overall predictive power of the model. These restrictions are a welltaken step toward eliminating the weakness of not making any predictions about suffix order within strata, which characterized earlier lexical phonology models. Important problems remain, however. For example, Fabb (1988) and Plag (1996, 1999) point out that there are numerous other important (phonological, morphological, semantic, syntactic) restrictions operative in English suffixation. Level ordering does not say anything about these restrictions. For example, Fabb finds that the fortythree suffixes he investigates are attested in only fifty combinations, although stratum restrictions would allow 459 out of the 1,849 ones possible.
In general, for any given affix, its phonological, morphological, semantic, and syntactic properties (or the properties of its derivatives, that is, of the morphological category) must be stated in its lexical entry. Plag (1996, 1999) shows that these diverse properties together are responsible for the combinatorial potential of a given affix. What has been analyzed as wouldbe stratal behavior thus falls out from the phonological, morphological, and semantic properties of the affix. Since these properties must be stated anyway to account for the particular behavior of a given affix, no further stratal apparatus is necessary.
What is the alternative? Hay & Plag 2004 proposes that selectional restrictions and processing constraints together are responsible for the attested and nonattested patterns of suffixsuffix combinations. Selectional restrictions are understood as affixparticular properties that govern the kinds of combinations that are allowed for that affix. Such restrictions can refer to phonological, morphological, syntactic, or semantic characteristics of the elements to be combined. For example, verbforming en (as in blacken) attaches only to monosyllables that end in an obstruent.^{1} This phonological restriction also has immediate consequences for suffix combinations with en as the outer suffix. Given that en attaches only to monosyllables, en may never attach to bases that [End Page 111] themselves contain a suffix that creates a new syllable, which would be the case for most adjectival suffixes (such as al, ive, ous, etc.). We can therefore predict that the combinations alen, iveen, and ousen will not occur, due to the selectional restriction of en. Another example, this time involving a morphological restriction, is the suffix ize, which selects only ation as the nominalizing suffix (as in colonization). An instance of a semantic restriction is the verbalizing suffix ate, which attaches productively only to nouns that refer to chemical substances (such as fluorinate, Plag 1999). Since such substances are usually denoted by nouns, ate will never follow an adjectival or verbal suffix, or an abstract nounforming suffix such as ism.
In addition to selectional restrictions, Hay (2002, 2003) proposed a general processing constraint, named COMPLEXITYBASED ORDERING by Plag (2002), that limits the number of possible combinations of affixes. The underlying idea is that the morphological separability of affix and base is a graded phenomenon, with farreaching consequences for affix stacking:
While some affixes basically tolerate no internal structure, others will tolerate structure to some minimum degree. The degree of internal structure tolerated by an affix is . . . determined . . . by how much structure that affix, itself, creates. Phrased in terms of processing, AN AFFIX THAT CAN BE EASILY PARSED OUT SHOULD NOT OCCUR INSIDE AN AFFIX THAT CANNOT.
To understand this better, we must take a closer look at morphological processing. It is often argued that there are two ways of processing a morphologically complex word. Either the word is decomposed, in which case it is produced or understood through its constituents (in and sane for insane), or processed as a whole (insane), as if it were a monomorphemic form. In parallel dualroute models of morphological processing, wholeword access (critically depending on storage) and access through constituents (critically depending on rules) are used simultaneously. Which of the access routes is most efficient for a given word depends on the frequency of the derived word, the frequencies of its constituents, and various other processing parameters relating to word length, lexical competition, and semantic connectivity (Baayen & Schreuder 1999, Baayen & Moscoso del Prado Martín 2005).
Hay (2001) has argued that the preference for one of the routes depends specifically on RELATIVE FREQUENCY, that is, the relation between the frequency of the derived word and that of its base. Hay (2001) shows that the higher the frequency of the derived word in relation to the base word, the less likely is constituentbased processing. Alternatively, the lower the frequency of the derived word in relation to the base word, the more important the role of the constituents becomes. Let us look at an example. The derived word government (British National Corpus (BNC) lemma frequency 66,894) is much more frequent than its base govern (BNC lemma frequency 2,626); hence there is a wholeword bias for government. Note that this wholeword bias is reflected in the pronunciation of government, which involves either the assimilation of basefinal /n/ to the suffixinitial /m/, or even the complete loss of the last syllable of the base. Such phonological opacity is typical of lexicalized forms, that is, forms with a wholeword bias in processing. In contrast, discernment (BNC 61) is much less frequent than its base discern (BNC 452), which leads to a strong bias for constituentdriven processing.
Such a view on complex words has serious implications for the notion of morphological complexity. First, it means that the role of the same suffix in lexical access will be different in different words depending on the respective frequencies of base and derivative (e.g. ment plays a greater role for discernment than for government). Second, suffixes occurring in many words that are less frequent than their bases will tend to [End Page 112] be more important in processing than suffixes represented by few words that are less frequent than their bases (compare, for instance, ness and ic). Finally, affixes that are stronger in processing will occur outside affixes that are weaker, an idea already expressed in Burzio 1994:354.
Hay and Plag (2004) show that suffixes can indeed be ordered in a hierarchy of juncture strength, such that affixes following an affix A on the hierarchy can be added to words already containing A, but affixes preceding A on the hierarchy cannot freely attach to words containing A. Thus, given a hierarchy of suffixes XYZABCD, possible combinations would, for example, be baseAB, baseXAC, or baseYZA, whereas *baseAZ, *baseYAZ, and *baseXAY would be impossible combinations.
Hay and Plag investigated the relationship between processing constraints and selectional restrictions (grammatical constraints) by looking at attested and nonattested combinations of fifteen English suffixes (dom, ee, en, er, ess, ful (adjectival), ful (nominal), hood, ish, less, ling, ly, ness, ship, th). They checked all possible combinations (N = 210) for attestations in the British National Corpus (BNC, Burnard 1995), the CELEX lexical database (Baayen et al. 1995), and the Oxford English Dictionary (OED 1994), supplemented by some data obtained through internet searches, and found that the attested combinations can be arranged in a strict hierarchy. This hierarchy of suffixes established on the basis of attested combinations correlates with the order of suffixes established on the basis of suffixal processing measures, such as relative frequency and productivity, as computed in Hay & Baayen 2002. This correlation provides evidence for the idea that the hierarchy is largely constrained by processing. In addition to such processing considerations, Hay and Plag show, however, that the nonattested combinations are also largely excluded by selectional restrictions and that the range of combinations allowed by the general restrictions on constituentdriven processing is further curtailed by selectional restrictions. The authors conclude that processing constraints and grammatical constraints (selectional restrictions) work hand in hand, such that the grammar tends to create structures that are better processable. Similar insights can be found in Hawkins 2004 for word order, and in Bybee 1985 for affix order from a typological perspective.
The model proposed in Hay & Plag 2004 raises some problems and questions. A first problem, first pointed out in Plag 2002, is that there are suffixes that quite regularly appear inside and outside of each other. For example, adjectival al may appear inside and outside of nominal ion, as illustrated in sensational and colonialization. In Hay & Plag 2004, these cases are also considered, and it is shown that in such cases of multiple affixation, a middle suffix such as ize preferably attaches to al derivatives with a low level of decomposability. The same would hold for individual formations with a wholeword processing bias, such as business, which could in principle be the base for suffixes that are lower in the hierarchy than ness. What may be surprising is the fact that very few of such reverse combinations are actually attested. This is understandable, however, in the light of two facts. First, such a reverse order would still have to observe the selectional restrictions of the suffixes involved. This severely restricts the range of possible combinations. Second, outer suffixes are outer suffixes exactly because most of their derivatives are better parsable than inner suffixes, which means that we find relatively few derivatives with outer suffixes that have a wholeword bias. This in turn reduces the probability of new formations based on wholewordbias forms with outer suffixes. [End Page 113]
A related problem concerns the potential combinations of ness and less. Both combinations seem to be structurally possible in the sense that they are allowed by the selectional restrictions of these suffixes. Interestingly, Hay and Plag found a number of forms in their data involving the combination lessness (as in hopelessness), but did not find the combination nessless (as in the invented happinessless). This is all the more surprising because less is more productive than ness according to the measures developed in Hay and Baayen, and therefore should preferably occur outside of less productive ness. The theory would therefore predict, contrary to fact, that formations such as happinessless should be in actual use, rather than formations such as hopelessness. It is therefore necessary to investigate whether this mismatch between theory and data is due to the limitations of the database used by Hay and Plag.
A second problem of Hay & Plag 2004 that needs to be addressed is that the set of suffixes comprised only fifteen forms, most of which are stratum 2 suffixes. This raises the question of whether their account would also hold for a wider range of suffixes, including both stratum 1 and stratum 2 suffixes.
A third problem concerns the indirect nature of the evidence supporting the relationship between processing costs on the one hand and productivity and relative frequency on the other. The theory would gain weight if the current indirect evidence could be supplemented by more straightforward experimental evidence. Without experimental support, claims about psychological plausibility remain hypothetical. In the following sections, these three problems are addressed in turn.
3. The InsideOutside Problem: nessless vs. lessness.
Is the absence of formations ending in nessless in the data surveyed by Hay and Plag due to limitations of the size of their database (CELEX, BNC, and OED)? Arguably, the worldwide web provides the largest current resource of English usage, and we should therefore be able to trace such formations in this resource, if they exist. To this end, we took all 2,466 attested ness words from the BNC (Burnard 1995), added to them the string less, and carried out a systematic internet search. We used the software made available by Hayes (2001) to search for pertinent words via Google on Englishlanguage webpages. The search yielded quite a number of attestations, and the results thus confirm the prediction of the structural analysis that nessless should be possible, given the selectional restrictions of these two suffixes. Table 1 lists the ten most frequently attested forms and their frequencies (as of November 17, 2007) for illustration.
A closer look at the individual websites reveals that many websites are counted more than once in Google and that many attestations are quite dubious, because they involve typos, word play, or are coined by nonnative speakers. But it is possible for each nessless formation listed in Table 1 to find forms that are highly natural and embedded in [End Page 114] idiomatic native English. To illustrate this point, we give in 2 citations for the first four words in the list.
(2)
a. Eight Deadly Sins Of Web 2.0 StartUps. [. . .] Happinessless: Your startup has no future if you are not happy
(http://www.slideshare.net/imootee/eightdeadlysinsofweb20startups/)
b. He walked past the sleeping fat man quietly, careful not to disturb something he himself envied. How long had it been since he'd slept, since he'd drifted into such a consciousnessless slumber? He got up and walked the corridors. (http://www.geocities.com/jacory69/SHORTFICTION.html)
c. 'A lot of people are very alarmed, and this is only natural . . . but whether they are a home or a business, they are not going to be left homeless or businessless,' Ed Weirauch, spokesman for the Camden Redevelopment Agency, said in a telephone interview yesterday.
d. I'm trying to figure out if the general, conceptual skeletons of the Latin letterforms (independent of a given design) need to have some information about the stroke thickness (certainly relative, not absolute), to be_useful/make_sense. Normally, if you ask somebody to draw the 'essential' form of a letter (or visualize it in your head), you get a thicknessless diagram.
(https://listserv.heanet.ie/cgibin/wa?A2=ind9904&L=typol&P=12980)
Interestingly, in three of the four examples the ness derivative on which the nessless form is based is mentioned in the preceding discourse, and the previous mentioning seems to make the ness form more easily available for further affixation (see e.g. Kastovsky 1986, Baayen & Neijt 1997). This raises the question (Hay, p.c.) of whether the ness derivatives that take less are a random selection from all ness derivatives, or whether they are forms that are especially prone to further affixation, for example due to their frequential properties.
In order to check whether the bases in ness that take less are somewhat special (and not randomly chosen from all ness derivatives), we checked the frequencies and relative frequencies in CELEX for all ten ness bases of the words in Table 1. According to Hay (2001), we should expect that these bases range among the most frequent ness words, and that their relative frequencies are also rather high in comparison to all other ness words. Both of these facts would make these words more prone to further affixation than ness words that are less frequent or have a smaller relative frequency. In other words, the ness derivatives in question should not be randomly selected.
Both of the above predictions are borne out by the facts. CELEX contains 1,316 different ness derivatives, with frequencies ranging from 4,409 to zero (zero pertains to data that were not found in the COBUILD corpus but entered CELEX through dictionaries). Of the six most frequent ness derivatives in CELEX, five feature as bases in Table 1 (business ranks first,^{2} darkness comes in second, consciousness fourth, [End Page 115] weakness fifth, and happiness in sixth position). The least frequent base of the formations in Table 1, randomness, still has rank 218 in CELEX. As expected, the frequencies of these forms correlate with their respective relative frequencies (ρ = 0.95, p < 0.001, Spearman's rank correlation). The relative frequencies of the ten ness bases occupy ranks between 24 and 451 among the 1,316 ness derivatives, which means that all forms in question can be found, roughly speaking, in the upper third of all ness derivatives. In sum, our ten bases are not a random pick from all ness derivatives. Their frequential properties make them especially prone to further affixation, in line with the assumptions of complexitybased ordering (see also Krott et al. 1999).
Returning to the question of parsability of ness vs. less, we find that the suffix less has higher figures than ness on four measures of separability (Hay & Plag 2004): productivity: 0.016 vs. 0.008; tokenparsing ratio: 0.74 vs. 0.23; typeparsing ratio: 0.86 vs. 0.51; boundary strength (average rank according to the preceding three measures): 13.33 vs. 8.67. Hence, less should occur to the right of ness, and we now see that forms instantiating the predicted order do indeed exist.
But we are still left with the problem of why these are obviously less common than the opposite, less easily parsable sequences. In fact, the problem of the two structurally possible combinations (lessness and nessless) is yet another instance of affix sets that seem to violate the predictions of complexitybased ordering, similar to al, ion, and ize mentioned above.
With our nessless and lessness derivatives, however, we have a somewhat different situation in that there are only two suffixes involved that can swap positions. This is predictable under the selectionalrestriction approach, but unexpected by complexitybased ordering. Similar examples are discussed below. In the general discussion, we return to discuss this problem for complexitybased ordering. In what follows, we first seek to broaden the empirical scope for the hypothesis of complexitybased ordering.
4. Extending the Data Set to ThirtyOne Suffixes.
We extended Hay and Plag's set of fifteen suffixes by adding another sixteen suffixes to test whether the predictions of complexitybased ordering are also borne out if more suffixes are taken into consideration, and if more suffixes from stratum 1 are also included. Table 2 lists all suffixes with their selectional restrictions. The last column shows whether the suffix was part of Hay and Plag's data set, or is newly introduced in this article. The selectional restrictions are derived from pertinent reference works on this question, such as Jespersen 1942, Marchand 1969, Bauer 1983, Adams 2001, Plag 2003, but also more specialized treatments on individual suffixes, such as Ljung 1970, Malkiel 1977, Riddle 1985, Barker 1998, Plag 1999, Ryder 1999, DaltonPuffer & Plag 2000.
Overall, our data set amounts to 930 potential combinations, of which we found 160 attested in CELEX, the BNC, or the OED. To this we added the combination of ness followed by less that we documented above. The attested 161 combinations and pertinent examples are documented in the appendix. Figure 1 shows the attested combinations, with the suffixes listed in alphabetical order. Note that the matrix shows a 1 even if there was only a single derivative of its kind attested in our vast database. We deliberately did not impose a threshold on the number of attested forms, in order to increase the chance of falsifying the hierarchy hypothesis. The figure can be read as follows: a 1 indicates that a combination is attested, with the suffix on the left margin as the inner suffix and the suffix on the upper margin as the outer suffix. In graph theory (see e.g. Jungnickel 2007), Fig. 1 is referred to as the adjacency matrix of a directed graph. [End Page 116]
[End Page 117]
We can see a great many zero cells, the vast majority of which are naturally explained by the work of selectional restrictions. Let us take a look at the first combinations in the nexttolast row for illustration. The suffix in the left margin, th, can combine with en, because it has monosyllabic derivatives (like length) and also meets the other requirements imposed by en on its bases. The suffix th may, however, not precede or, ment, or ive, since these suffixes normally attach to verbs, and th creates nouns. The combination tharyN is ruled out by the semantic restriction of aryN, which takes as bases only nouns that denote persons, objects, or locations. Note also that the selectional restrictions are sometimes not clear enough to exclude certain combinations. For example, the combination thster is not ruled out by any of the known selectional restrictions on bases of ster, as listed in Table 2. In the vast majority of cases, however, an empty cell indicates that not only is this combination not attested, but that it is also structurally impossible. This finding is in line with those of Hay and Plag (2004), who also show for their data set that the suffixparticular selectional restrictions rule out the vast majority of combinations. [End Page 118]
Returning to complexitybased ordering, its prediction now is that the suffixes can be reordered in this table in such a way that there are no affix combinations below the main diagonal of the adjacency matrix. In graph theory, this prediction is equivalent to the hypothesis that the adjacency matrix defines a directed acyclic graph. To understand this prediction and its implications, consider ess. Figure 1 shows us that this suffix attaches to words ending in, for instance, er, ist, and ian. We should therefore rank these three suffixes before ess in the ordering hierarchy. The prediction now is that none of the suffixes that can attach to words ending in ess, such as less, dom, or ship, can precede er, ist, and ian in other derived words. Since less, dom, and ship can attach to words ending in ess, they have to be ordered after ess. If a suffix preceding ess in the hierarchy would nevertheless precede dom, ship, or less in one or more complex words, then these affix combinations would show up below the main diagonal of the adjacency matrix. The left panel in Figure 2 illustrates the fifteen combinations that are ruled out by complexitybased ordering given the six suffix combinations with ess marked as attested. The ordering within the triads er, ist, ian and dom, ship, less is arbitrary. The right panel shows a directed acyclic graph that is equivalent to the information in the table to its left. Attested combinations are connected by 'edges', with arrowheads specifying the order of the affixes. If a derived [End Page 119] word with ship preceding ist were to be discovered, it would introduce a 1 below the diagonal, and an upwardpointing arrow in the graph. This upward arrow would introduce a cycle into the graph, and change it from an acyclic into a cyclic graph.
The question that we now have to address is whether Fig. 1 can indeed be rearranged such that no suffix combinations are left below the diagonal. Or, framed in terms of graph theory, is the graph defined by Fig. 1 indeed a directed acyclic graph that can be drawn such that all its edges point downward? Affix pairs for which both orders are attested, such as ness and less, immediately tell us that our graph cannot be acyclic. The question therefore becomes to what extent our graph is surprisingly close to being acyclic.
Figure 3 presents a diagram for our directed suffix graph using the implementation of the algorithm of Gansner et al. 1993 as available in the R (R Development Core Team 2007) package Rgraphviz (which is part of the bioconductor project; see http://www.bioconductor.org).
In this figure, the overall majority of arrows (151 of 161, shown in gray) point downward. But we find ten upwardpointing arrows, highlighted in black. Seven of these give rise to cycles with two nodes (aryAment, istaryN, ishian, ister, nessless, nesswise, eryist) and three introduce larger cycles (shipment, ishment, lyAJish). To the best of our knowledge, these ten exceptions cannot be eliminated; we can at best find a reordering that has again ten exceptions. These ten exceptions tend to be the same across reorderings, but need not be fully identical. Figure 4 below presents a rearrangement of the rows and columns of the adjacency matrix Fig. 1, with the same ten exceptions appearing below the main diagonal. The small number of exceptions suggests informally that the English suffixes under consideration closely approximate the predictions of the complexitybasedordering hypothesis.
At this point, we need to consider how good this approximation is. A problem that arises here is that Fig. 3 presents only one of many possible visualizations, and Fig. 4 represents only one of many possible rearrangements of the adjacency matrix with ten exceptions below the diagonal. The problem of minimizing the number of upwardpointing arrows (or equivalently, the number of observations below the diagonal in the adjacency matrix), a problem known as the 'feedback arc set' problem, is NPcomplete, [End Page 120] which basically means it cannot be solved in reasonable time (Gansner et al. 1993). To evaluate the extent to which surprise is justified for the observed number of exceptions, we proceeded as follows.
As a first step, we considered the likelihood of observing as few as ten feedback arcs in a graph with thirtyone vertices and 161 randomly assigned edges. To estimate this likelihood, we ran 10,000 simulation runs in which 161 edges were randomly assigned, and calculated for each run the number of feedback arcs and loops (edges connecting a vertex to itself). In these simulations, edges were assigned randomly, thereby creating random suffix combinations that are, in general, not attested in English. For the 10,000 runs, the mean number of edges violating acyclicity was 83.15, and the smallest number of such edges (observed once only) was sixtythree. This shows that it is extremely unlikely (p < 1/10,000) that the actually observed number of edges violating acyclicity (ten) is due to chance.
As a second step, we considered to what extent it is possible to reorder the rows and columns of the simulated adjacency matrices such that as many violations of acyclicity as possible are removed. Using the algorithms described in Gansner et al. 1993 and implemented in the Rgraphviz package for the R statistical programming environment, we calculated for each simulation run the number of edges violating acyclicity that could not be eliminated. The mean number of such edges was 70.73, and the minimum was 47. This provides further evidence that the actually observed number of violating edges (ten) is much smaller than expected under random conditions.
As a third step, we proceeded to consider the number of violations observed when rows and columns of the empirical adjacency matrix are randomly reordered, while [End Page 121] respecting the existing affix combinations. Across 10,000 simulation runs, the mean number of violations was 78.96, with a minimum of forty (observed once only). This provides further confirmation that the observed number of violations (ten) is unlikely to be due to chance.
For each of this second set of 10,000 simulated data sets, we also considered to what extent the number of violations might be reduced by application of the algorithm of Gansner et al. 1993. The mean number of exceptions obtained in this way was 21.04, with a minimum of ten (observed in 21/10,000 = 0.0021 of the simulation runs). Apparently, the extent to which the algorithm of Gansner et al. 1993 is successful depends on the initial order of rows and columns in its input adjacency matrix. Importantly, ten violations emerged as our best estimate of the minimum number of violations that can be obtained by reordering rows and columns such that the actual affix combinations are respected.
In summary, in an unconstrained, 'parallelworld' scenario, the likelihood of observing only ten exceptions to acyclicity when 161 edges are assigned randomly in a graph with thirtyone vertices is vanishingly small, even when exceptions are eliminated to the best extent possible by reordering the suffixes. In a more constrained, 'realworld' scenario in which the real affix combinations of English are inspected and left unchanged in the simulations, the number of violations observed under random reordering [End Page 122] of rows and columns remains much larger than ten. In short, under random conditions, the observed number of violations remains extremely unlikely. Finally, reorderings guided by the algorithm of Gansner et al. 1993 show that ten is our best estimate of the minimum number of violations possible. Fig. 4 shows one of the suffix rankings for an adjacency matrix with this minimum number of violations.
Let us now consider the exceptional suffix combinations, as listed below the diagonal of Fig. 4 and represented with upward arrows in Fig. 3, in order to assess their theoretical weight. We first checked whether these problematic combinations generate a high number of types by checking their type frequencies in the CELEX lexical database (c. eighteen million tokens) and in the written subcorpus of the British National Corpus (c. ninety million tokens). Types with these combinations are not attested in either of the two corpora. Figure 5 illustrates this point by listing all combinations (with their type frequencies) that can be found in these two sources.^{3} This means that none of the exceptions come from the two corpora; they are all from the OED.
[End Page 123]
Having observed that the exceptional combinations are extremely rare, we now consider their linguistic properties. We first consider the seven instances where we have doubleheaded arrows in Fig. 3 and where in the adjacency matrix (Fig. 4) the order below the diagonal has a mirror image above the diagonal, the latter in accordance with complexitybased ordering, and the former providing a counterexample. Consider the pairs in 3, in which the form found below the diagonal is printed in bold.
(3)
a. aryA and ment:militaryment, complementary
b. st and er: alchemister, consumerist
c. ish and ian: Irishian, christianish
d. ist and aryN: evangelistary, voluntaryist
e. ery and ist: artillerist, dentistry
f. ness and less: happinessless, hopelessness
g. wise and ness: otherwiseness, businesswise
Let us discuss each pair in turn. Of the pair militaryment and complementary in 3a, the first violates complexitybased ordering. Given that the form also violates the selectional restrictions of ment, which takes either bound roots or verbs as bases, militaryment can be regarded as truly idiosyncratic. The combination alchemister in 3b, an earlier form of alchemist, has its last citation in the OED in 1586. In 3c, the (according to the OED) 'nonceword' Irishian, 'an expert in the Irish language' (singularly attested for 1834), is obviously coined in analogy to Grecian 'an expert in the Greek language' and not an instance of some regular suffixation pattern. The four remaining cases (3dg), evangelistary and voluntaryist, artillerist and dentistry, happinessless and hopelessness, and otherwiseness and businesswise, are all in accordance with the selectional restrictions of the respective suffixes. These suffix sequences suggest that complexitybased ordering imposes a very strong but nevertheless probabilistic constraint on suffix ordering. The last four exceptional formations all have very small probability in the language, as evidenced by their nonoccurrence even in large corpora, but they are possible and available for use.
The remaining three cases, which are represented with upward arrows introducing cycles with more than two nodes into the graph, are woollyish, courtshipment, and foolishment. The latter two forms violate the selectional restrictions of ment, which is a deverbal suffix, and are thus idiosyncratic forms (note that courtshipment is singularly attested in 1649, while foolishment is a twentiethcentury formation). The derivative woollyish is a somewhat controversial case, since it could well be argued that this formation does not involve the adjectival suffix ly, as in fatherly, but of adjectival y, as in hairy. The latter analysis is given in the OED, while CELEX provides both possibilities, but it is unclear how one could prove either analysis. We therefore decided to be conservative and treat woollyish as a formation involving adjectival ly, and therefore as an exception to strict hierarchical ordering. Under the analysis of the OED, this suffix would not even count as a counterexample to complexitybased ordering. In any case, the formation is in complete accordance with the selectional restrictions of ly and ish. In Hay & Plag 2004, the suffixes were in fact ranked in this order, and it is only the attested combinations with suffixes from the new set of suffixes that forced us to rerank ish before ly. The combination as attested in woollyish, if indeed taken as an instance of ly and not of y, would provide yet another illustration of the fact that complexitybased ordering is subject to minor leakage when the selectional restrictions offer opportunities for word formation below the diagonal in the adjacency matrix (cf. Fig. 4). To summarize our discussion of the counterexamples, we can say [End Page 124] that there are indeed such exceptions. They are, however, very rare; that is, they do not occur in the corpora we investigated, but are only attested in the OED. Of these forms, many are very old and probably not in use. Hence, complexitybased ordering emerges as a robust generalization with only a little leakage.
The interesting question now is why a suffix hierarchy should exist in the first place. Why should selectional restrictions impose a partial ordering that, with a few exceptions, can be represented as a directed acyclic graph? This is a pure accident under the selectionalrestriction hypothesis. Hay & Plag 2004 argues that it follows naturally from complexitybased ordering. According to complexitybased ordering, the position of a given suffix in the hierarchy reflects the degree to which that suffix is processed independently of its base. Suffixes higher in the hierarchy, that is, those that are more to the right in the adjacency matrix, should be more easily separable from their bases than those suffixes that are lower in the hierarchy, that is, more to the left in the adjacency matrix. Overall, the most easily separable suffix should be at the right end of the hierarchy, and the least separable suffix at the left end. In order to test whether this is true, one would have to check whether the rank of a given suffix in the hierarchy correlates with the rank of that suffix with regard to independent measures of constituentdriven processing. Recently, Hay and Baayen (2002) have proposed such measures, and we follow Hay and Plag's (2004) footsteps in employing them to test complexitybased ordering.
On the basis of the analysis of eighty English affixes, Hay and Baayen show that parsing ratio and productivity are strongly correlated. In general, those affixes that are easily separable from their bases in parsing are also those suffixes that are most productive. This fact is in line with the observation that productive processes are semantically and phonologically transparent. To determine the type and tokenparsing ratios, Hay and Baayen calculate, for any given affix, in what proportion of words the affix is likely to be parsed, based on the frequency characteristics of the affixes and the words that contain it. Using our example from above again, ment is probably parsed in discernment (because discern is much more frequent than discernment), whereas it is probably not parsed out in government (because government is more frequent than govern). Hay and Baayen also calculate the proportion of tokens containing the affix that are likely to be parsed. The resulting parsing ratios therefore indicate the proportion of types (the typeparsing ratio) or tokens (tokenparsing ratio) containing an affix that are likely to be parsed. For example, if an affix was represented only by words that are unlikely to be parsed, the parsing ratios would be zero. If it was represented only by words that are likely to be parsed, the parsing ratios would be one. The higher the type (or token)parsing ratio, the greater the proportion of types (or tokens) that are prone to parsing.
For the computation of productivity, Hay and Baayen (2002) used the corpusbased productivity measure P (Baayen & Renouf 1996), which is the number of hapax legomena (that is, the words that occur only once in the corpus with a given affix) divided by the number of tokens with that affix. This productivity measure is based on the following reasoning (see, for example, Plag 1999:Ch. 5 for more detailed discussion). Assuming that productivity is defined as the possibility of creating a new word, it should in principle be possible to estimate or quantify the probability of the occurrence of newly created words of a given morphological category. By definition, newly coined words have not been used before; they are lowfrequency words and do not have an entry in our mental lexicon. But how can we understand these new words, if we don't know them? We can understand them in those cases where an available wordformation [End Page 125] rule allows us to decompose the word into its constituent morphemes and compute the meaning on the basis of the meaning of the parts. The wordformation rule in the mental lexicon guarantees that even complex words with extremely low frequency can be understood. If, in contrast, words of a morphological category are all highly frequent, these words will tend to be stored in the mental lexicon, and a wordformation pattern will be less readily available for the perception and production of newly coined forms. This means that unproductive morphological categories will be characterized by a preponderance of words with rather high frequencies and by a small number of words with low frequencies. With regard to productive processes, we expect the opposite, namely large numbers of lowfrequency words and small numbers of highfrequency words.
The crucial point now is that the number of hapax legomena of a given morphological category correlates with the number of neologisms of that category, so that the number of hapaxes divided by the number of tokens of that category can be seen as an estimate of the probability of new formations, and as such as an indicator of productivity (see Baayen & Renouf 1996, Plag 2003:Ch. 3, 2006 for further illustration and discussion).
What is important for the present article is that Hay & Baayen 2002 computed the productivity measure P with eighty affixes, twentyfive of which are in the set of suffixes under investigation. We could therefore exploit Hay and Baayen's results, which were obtained for purposes entirely different from the ones of this article.
The information from Hay & Baayen 2002 provides us with the possibility of checking whether our 'nearly acyclic' directed graph can provide us with an affix ranking that is correlated with measures of constituentdriven processing and productivity. The problem that we have to face here is that there are many different orderings with ten exceptions, and many different diagrams such as that shown in Fig. 3 can be obtained depending on the specific initial order of rows and columns of the input adjacency matrix. To obtain a ranking that is independent of any of these many particular orderings, we ran a large number of simulation runs, which eventually provided us with 4,073 adjacency matrices with ten exceptions. We averaged the ranks of the affixes in these matrices. These averaged ranks are listed in Table 3.
The averaged rank turned out to be correlated only with the logtransformed P measure, as shown in Figure 6. The regression line (estimated slope 0.079, t(20) = 3.164, p = 0.0049) was obtained after removal of three influential outliers (shown in gray), as revealed by standard regression diagnostics: the dfbetas, the dffits, the covariance ratios, Cook's distance, or the diagonal elements of the hat matrix (see, for example, Chatterjee et al. 2000). Two of these outliers, ling and dom, have a P value of zero, which we had changed into 0.0001 before the logtransform.
This means that for the vast majority of suffixes (namely those with a nonzero degree of productivity), complexitybasedordering rank and productivity P are correlated (R^{2} [End Page 126] = 0.33). The main pattern in the data is that more productive suffixesthat is, suffixes for which the likelihood that a newly sampled token with that suffix represents an unseen typetend to have higher mean ranks. Thus we find th and en to the left in the hierarchy with low ranks, and ness and less to the right in the hierarchy with high ranks.
In summary, three facts are noteworthy. First, sequences of English suffixes define a directed graph that is remarkably close to being acyclic. The evidence that this is not due to chance is very strong. Second, there is a substantial amount of indeterminacy in the exact order of the suffixes. Even if the graph would be acyclic, the number of different orderings (called 'topological sorts' in graph theory) can be large. We therefore sampled from the many optimal orders (the orders with a minimum of exceptions) to obtain mean ranks that define an objective linear ordering. Third, the evidence that these mean ranks directly reflect processing complexity is weaker than for the smaller sample of suffixes in Hay & Plag 2004. The mean ranks are correlated with only one measure, P, instead of three, as in the study in Hay & Plag 2004, and two suffixes with zero productivity P behave exceptionally. In the next section, we therefore address the question of whether independent experimental evidence can strengthen the potential of complexitybased ordering as a principle of the grammar.
5. ComplexityBased Ordering and Lexical Processing.
5.1. The Data.
Is the rank of a suffix in the ordering hierarchy correlated not only with a distributional productivity measure such as P, but also with experimental estimates [End Page 127] of actual lexical processing costs? And if so, do we expect to find a positive or a negative correlation between mean complexityordering (CO) rank and processing cost? If we assume that a suffix that is easily parsed out in comprehension requires less processing time than a suffix that is difficult to parse out, then the prediction follows that suffixes with larger mean COrank will reveal shorter processing latencies. By contrast, race models of morphological processing generally assume that lookup in memory is less costly than decompositional processing. Combined with the much larger effect of fullform frequency compared to base frequency observed in Baayen, Wurm, & Aycock 2008, this leads us to expect that suffixes with small mean COrank will enjoy a processing benefit compared to suffixes with large values for mean COrank.
In order to explore possible answers, we have made use of the behavioral processing measures available in the English Lexicon Project at http://elexicon.wustl.edu/ (Balota et al. 2007). This database provides lexicaldecision reaction times and wordnaming latencies for some 40,000 English words.
Lexicaldecision and wordnaming tasks have been used extensively by psychologists to study lexical processing of both monomorphemic words (see e.g. Balota et al. 2004, Baayen et al. 2006) and morphologically complex words (see e.g. Taft & Forster 1976, Taft 1979, 1988, 2004, Burani & Caramazza 1987, Wurm 1997, 2000, Bertram, Baayen, & Schreuder 2000, Bertram, Schreuder, & Baayen 2000, Wurm & Aycock 2003, de Vaan et al. 2007). Response latencies elicited in these tasks have afforded considerable insight into a series of factors that have emerged as relevant for understanding lexical processing in language comprehension, although word naming must also have a production component that is not well understood at present. In what follows we employ these comprehension data to probe the balance of computation and storage. In the general discussion, we briefly address the question of whether the results generalize to speech production.
To make use of the data from the English Lexicon Project, we compiled a list of all bimorphemic words studied in Hay & Baayen 2002 and extracted all matching entries from the English Lexicon Project website. This resulted in a data set with 2,529 bimorphemic words covering twentyeight suffixes (data for adjectival ary, adjectival ful, and adverbial wise were not available in the English Lexicon Project), with the mean wordnaming latency for each word and the mean lexicaldecision latency for each word, averaged over subjects.
To this database we added, for each word, information about a wide range of lexical and distributional variables. The properties gauged by these variables range from a word's form (its length and the density of its similarity neighborhood) to the complexity of its inflectional (Kostić et al. 2003, Moscoso del Prado Martín, Kostić, & Baayen 2004) and derivational paradigms (Schreuder & Baayen 1997, Moscoso del Prado Martín, Bertram, et al. 2004), and from phonetic properties of the initial segment (which may cause measurement error for the voicekey device registering naming latencies) to frequency measures for the derived word and its base.
Many of the predictor variables that we included in the statistical models reported below are not of interest to the goals of this study when considered in isolation. Nevertheless, we discuss them briefly, for two reasons. First, their inclusion in our statistical models substantially reduces the likelihood that our critical morphological variables are confounded by other hidden processing factors. In other words, we need to make sure that the morphological variables that we are really interested in remain significant predictors in a model that also includes other correlated variables. For instance, word [End Page 128] length is negatively correlated with word frequency; hence, frequency effects for derived words and their base words can be assessed properly only once their lengths have been taken into account. Second, it turns out that the joint effect of all variables is important for understanding complexitybased ordering.
We fitted linear mixedeffects models (Pinheiro & Bates 2000, Bates 2005, Bates & Sarkar 2005b, Baayen, Davidson, & Bates 2008) to these data, using the lme4 package (Bates & Sarkar 2005a) in the R statistical programming environment (R Development Core Team 2005), with Affix as random effect and a range of fixedeffect lexical predictors. We first consider the wordnaming latencies, and then the reaction times in visual lexical decision.
5.2. Word Naming.
Table 4 lists the fixedeffects coefficients and their associated statistics for all predictors that emerged as significant in a stepwise regression analysis for the wordnaming latencies.^{4}
Inspection of the residuals of our initial model revealed marked departure from normality, which was alleviated substantially by removing data points with outlier residuals (defined as absolute standardized residuals exceeding 2.6). Table 4 lists the estimates of the coefficients for this final trimmed model. We evaluate significance with the help of 10,000 samples from the posterior distributions of the coefficients using Markov chain Monte Carlo (MCMC) sampling (see e.g. Baayen, Davidson, & Bates 2008). From these samples, we obtained the 95 percent HIGHEST POSTERIOR DENSITY [End Page 129] (HPD) confidence intervals, and the corresponding twotailed pvalues. Since the HPD interval never contains zero, and all pvalues are small, we may conclude that all predictors are statistically significant.^{5}
Let us go through Table 4 in more detail, starting with section A. To control for voicekey artefacts in word naming, we included factors specifying whether the first phoneme was voiced, and whether the first phoneme was a vowel or a consonant. Only the voicing of the first phoneme (Voiced) turned out to be a significant predictor for the naming latencies: voiced initial segments triggered the voicekey more quickly and effectively than unvoiced initial segments.
Section B of Table 4 lists the effects of two measures for the words' lengths. Longer words elicited longer naming latencies, as expected, both for length evaluated in letters (Length) and for length in syllables (NSyll).
Section C reports variables gauging the role of orthographic neighbors. A word's neighbors are often defined as those words that differ in one letter or phoneme (house/mouse). The total number of such neighbors is often considered a measure of lexical competition in visual and auditory comprehension (Coltheart et al. 1977, Luce 1985, Pisoni et al. 1985, Balota et al. 2004) and speech production (Vitevitch 2002, Scarborough 2004, Wright 2004, Vitevitch & Stamer 2006). For the present data we did not observe significant effects of the total count of orthographic neighbors (for a similar result for monomorphemic words, see Baayen et al. 2006). But N1, the count of neighbors of the base (evaluated for lemmas in CELEX) that differ only with respect to the initial phoneme (e.g. deal and veal), was predictive: the greater the number of neighbors sharing the remainder of the base word, the shorter the naming latency was. Other measures effectively gauging phonological coactivation are Shannon entropies calculated for the cohort competitors at the first (H_{1}) and the third (H_{3}) segment of the word (evaluated against the wordforms in the CELEX lexical database). These entropy measures indicate that the target entered into a process of competition with words sharing the words' initial segments (see Baayen 2007 for further discussion of these measures).
The predictor labeled BNCd represents the frequency of the derived word in the demographic subcorpus of the British National Corpus. This subcorpus samples spontaneous spoken English. We included this variable as a correction of the written frequencies (obtained from CELEX), particularly because it captures to a considerable extent differences in age of acquisition (Baayen 2005, Baayen et al. 2006). By bringing BNCd into the model, we reduce the risk of confusing effects of frequency of exposure with age of acquisition, a variable that by itself may have more explanatory power than [End Page 130] word frequency (Brysbaert et al. 2000, Morrison & Ellis 2000, Lewis et al. 2001) but that generally is not available for complex words.
The variables in sections A to D of Table 4 were included, as mentioned above, to bring under statistical control a range of nonmorphological variables that are known to play a role during lexical processing. Sections E and F of Table 4 bring us to the morphological variables that are of primary interest to us here. Section E discusses measures relating to properties of the derived words and their base words; section F discusses measures representing properties of the suffix.
Section E begins with listing the coefficients for the frequency of the derived word (collapsing inflectional variants), denoted by DerFreq, and the frequency of its base word (again collapsing over inflectional variants), denoted by BaseFreq. These two frequency measures have often been interpreted as measures of memorybased and ruledriven processing (see e.g. Pinker 1991, 1997, 1999, Baayen et al. 1997). It is more likely, however, that derived frequency taps into procedural memory traces for complex words (i.e. past experience with parsing and producing the complex word), and that base frequency taps into the general availability of the base word (see Taft 2004, Baayen 2007, Balling & Baayen 2007, Baayen, Wurm, & Aycock 2008).
We modeled the nonlinear effects of derived frequency and base frequency by means of quadratic polynomials.^{6} The functional relation between these frequency measures and naming latency is now represented by part of a parabola, instead of by a line segment. The mathematical equation for a parabola is y = a + bx + cx^{2}, where a is the intercept, b is the LINEAR COEFFICIENT, and c is the QUADRATIC COEFFICIENT. Straight lines can be viewed as a special case with c = 0, in which case b is the coefficient for the slope of the line. In Table 4 the linear coefficients of the two polynomials are referenced as DerFreq and BaseFreq and the corresponding quadratic coefficients as DerFreq^{2} and BaseFreq^{2}. As the individual coefficients of a parabola are somewhat less straightforwardly interpretable, their joint effect is visualized in the two upper panels of Figure 7. (The lower panels show the corresponding effects in visual lexical decision, to be discussed below.)
What Fig. 7 tells us is that in word naming the facilitatory effects of frequency level off for higher frequencies, and more so for base frequency than for derived frequency. In fact, the facilitation obtained from base frequency is restricted to two thirds of the frequency range. We return to discussing these frequency effects and their theoretical consequences for relative frequency (Hay 2001) and complexitybased ordering below. Here we restrict ourselves to pointing out that the frequency of the derived word is a much stronger predictor than the frequency of the base word. A change of one log derivedfrequency unit in the left panel corresponds to a much larger reduction in log naming latency than a unit change in base frequency. This differential effect is indicated numerically by the difference between the two linear coefficients in Table 4 (0.0297 for derived frequency vs. 0.0068 for base frequency). This highlights the importance of past experience with complex words.
The last two measures in section E of Table 4 gauge the role of the number of meanings carried by the derived word and by its base. In a fully decompositional system [End Page 131] with storage being reserved for irregular forms, the set of words closely related semantically to the base should be far more important for predicting lexical processing than the set of words related semantically to the derived words. The more the balance of storage and computation shifts toward storage, the greater we expect the relative contribution of the set of words related to the derived word itself to be.
We estimated numbers of meanings by means of the synonym sets (synsets) in WordNet (Miller 1990, Fellbaum 1998). For each derived word and each base word, we counted the number of synsets that they are listed in (SynWord and SynBase respectively). For both counts, we observe facilitation. This finding contrasts with the results obtained in Baayen et al. 2006 for monomorphemic monosyllabic words in English. For such words, the synset measures were predictive for visual lexical decision, but not for word naming. The present results show that when morphologically complex words have to be named, derivatives that themselves have denser semantic networks, and also derivatives with base words with denser networks, have a small advantage in word naming as well. [End Page 132]
Two explanations suggest themselves for this difference. Complex words might lead to deeper semantic processing than do simplex words. But a linguistically less interesting explanation calls attention to a difference in the makeup of the experimental lists used in the English Lexicon Project. For monomorphemic words, participants were exposed to monomorphemic words only. This may have led to shallow semantic processing. For the derived words, however, participants were also exposed to many complex words. This may have favored deeper semantic processing. Further research is required to evaluate the merits of these two alternatives. Interestingly, even though the range of number of synsets is smaller for the derived word than for its base, the facilitatory effect for the derived count is larger (cf. the coefficients of 0.013 for SynWord as against 0.009 for SynBase). This is consistent with the greater effect of derived frequency compared to base frequency, and supports the hypothesis that memory traces for complex words play an important role in lexical processing.
The final section of Table 4 turns to measures relating to the suffix. The predictor labeled LogBigFreq denotes the logarithmic transform of the orthographic bigram spanning the boundary between base word and suffix (e.g. df in handful). A greater frequency of the morphologically critical bigram correlates with longer naming latencies. A higher bigram frequency has been noted to render a word less decomposable (e.g. Seidenberg 1987, Andrews 1992, Hay 2003). Here we replicate the finding that decreased decomposability leads to delayed processing.
Surprisingly, affix productivity, as gauged by the number of types as listed in Hay & Baayen 2002 for bimorphemic derived words in English, here denoted by V, is inhibitory for word naming. A possible explanation for this unexpected inhibition is that a productive suffix might activate a larger range of base words, and that these base words subsequently compete with the base that is to be named. If this explanation contains a grain of truth, we should find that in lexical decision, a task that does not require the selection of a base for articulation, no inhibitory effect of V should be present. We return to this issue below.
The very last predictor in Table 4, labeled log(above), is the log of the number of types above the parsing line, one of the productivity measures mentioned previously that were introduced in Hay & Baayen 2002. Somewhat to our surprise, given the crude way in which Hay and Baayen estimated the parsing lineon the basis of a single suffix (ness) only, using a very simple computational modelthe number of types above the parsing line log(above) is a solidly significant predictor of naming latencies. Exactly as expected, a greater number of types above the parsing line correlates with decreased naming latencies. Other things being equal, suffixes that occur in large numbers of derivatives that have a high likelihood of being parsed themselves afford faster lexical processingat least as gauged by word namingfor ANY derivative in which they occur. If this result generalizes to visual lexical decision, the present evidence that parsing speeds lexical processing would be strengthened.
We conclude our analysis of the naming data by discussing the randomeffects structure of our model. In our analyses, Suffix, a factor with twentyseven levels, was incorporated as a random effect since there are many suffixes that we have not included in our sample. As a consequence, the coefficients of our models listed in Table 4 allow us to predict processing costs for the average unseen suffix. For the suffixes that actually appear in our sample, more precise predictions can be made, however, by finetuning these coefficients to each suffix individually. [End Page 133]
In mixedeffects modeling, such finetuning is always required for the intercept.^{7} As the intercept is a kind of grand average that expresses the baseline latency required for the task, finetuning the intercept for our data therefore amounts to allowing baseline processing costs to vary from suffix to suffix. The estimated standard deviation for the random intercepts for Affix was 0.0265.
For the present data, a likelihood ratio test shows that further finetuning is required for the linear coefficient of derived frequency (p < 0.0001). The facilitatory linear effect of derived frequency varied slightly but significantly from suffix to suffix. The standard deviation estimated for the random slopes for derived frequency was 0.0079. A parameter for a correlation of the random slopes and intercepts proved to be superfluous.^{8} We complete the description of our model with the estimate of the standard deviation for the residual error, which was 0.0797, and turn to the lexicaldecision data.
5.3. Visual Lexical Decision.
The fixedeffects coefficients and their statistics in the mixedeffects model fitted to the visual lexicaldecision latencies are summarized in Table 5. This model was obtained using the same analytical procedures as for the naming latency, and the general pattern of results is fairly similar. There are four relatively minor differences.
First, Length (section B of Table 5) shows a Ushaped relation with reaction time, with initial facilitation followed by inhibition, instead of the straightforward linear inhibition observed for word naming. This means that very short words are slower to recognize than mediumlength words, but beyond a certain point, additional length makes words slower to recognize. As for the frequency variables in word naming, we [End Page 134] modeled this nonlinearity by means of a quadratic polynomial. The linear term is denoted by Length in Table 5, and the quadratic term by Length^{2}. The present nonlinear relation between word length and decision latency replicates the results reported in New et al. 2006.
Second, the positional entropy measures H_{1} and H_{3} were not predictive for lexical decision (section C of Table 5). This supports the hypothesis of Baayen 2007 that these measures specifically gauge cohortlike competition during phonological articulation (see also van Son & Pols 2003, van Son & van Santen 2005).
Third, the effect of base frequency (section E of Table 5) is smaller compared to the effect of derived frequency, and linear rather than nonlinear, as shown in the lower panels of Fig. 7.
Fourth, recall that we observed an inhibitory effect in word naming of V, the number of bimorphemic words in which the suffix occurs. We hypothesized that a productive suffix that occurs in many other derived words coactivates these derived words, thereby creating additional competition for the base of the derived word that is to be articulated. In visual lexical decision, the V measure again had a positive coefficient, but it failed to reach significance (p > 0.3). Since visual lexical decisions can be based on global lexical activation and do not require selecting a specific base for articulation, the absence of inhibition in this task is in line with our explanation for the naming data. Possibly, the null effect of V in visual lexical decision is due to the increased competition for the base being canceled by the advantage of increased general lexical activation, which in general affords faster lexicaldecision latencies.
Finally, we note that the logtransformed count of types above the parsing line, log(above), was significant (section F of Table 5), just as in word naming.
We complete the summary of the mixedeffects model for the visual lexicaldecision latencies with the randomeffects parameters: a standard deviation of 0.0294 for the random intercepts, a standard deviation of 0.0062 for random slopes for derived frequency, a standard deviation of 0.0068 for random slopes for length, and a standard deviation of 0.0932 for the residual error. Both random slopes were supported by likelihood ratio tests (all p < 0.02). Parameters for correlations between random intercepts and random slopes did not have explanatory value and were therefore not incorporated into the model.
Having completed the detailed description of the model for the lexicaldecision data, we now summarize the differences and similarities between the two. Overall, the differences are minor. In lexical decision we find no entropy effects, the basefrequency effect is linear instead of nonlinear, and there is no effect of type frequency (and, trivially, no voicekey effect). In the naming latencies, the effect of length was linear rather than nonlinear. The commonalities outweigh these differences by far. In both tasks we see effects of length, of the number of syllables, of lexical neighbors, of base frequency, of derived frequency, of the number of synonyms, of the bigram frequencies, and of the number of formations above the parsing line. Most important for the present discussion is the absence of an effect of the rank in the hierarchy in both tasks, which means that this rank is not predictive for the costs of processing of individual words in comprehension.
We note in addition that there is no role for a straightforward measure of RELATIVE frequency in these models. Relative frequency was never predictive in our models. Relative frequency is defined simply as the ratio of base frequency to derived frequency. In our models, however, the two frequency measures have very different coefficients, [End Page 135] that is, their own regression weights, and different functional shapes (linear vs. nonlinear).^{9}
Furthermore, the randomeffects structure of our models shows that the itemspecific balance of rote and rule is modulated by the suffixspecific coefficients for intercept and derived frequency in both tasks. We conclude that the base frequency and derived frequency are both highly relevant, but that their relative contributions at the item level are more intricate than can be captured by a simple ratio, at least as gauged by the visual lexicaldecision and wordnaming tasks.
5.4. Mean CORank and Average BySuffix Processing Costs.
In the analyses described above, mean COrank never emerged as a significant predictor variable; that is, it failed to be predictive for the response latencies at the finegrained level of the individual items in word naming and visual lexical decision. But it turns out that this mean COrank is a relevant predictor for processing complexity at a higher aggregation level, that of the suffix itself, that is, when we average the latencies over all items with a given suffix. This predictivity emerges from an inspection of the median processing latencies for the suffixes. We opted for the median rather than the mean because distributions of reaction times tend to be skewed. For skewed distributions, the median is a better characteristic of the most typical value than the mean. We calculated the median latencies on the basis of the fitted (logtransformed) latencies, which provide our best theoretically informed estimates of suffixal processing costs. By using the median we removed the byobservation noise, and by using the fitted latencies (instead of the observed ones) we factored in all the processing factors discussed in the previous two subsections.
Apart from mean COrank as predictor variable, we included in our analysis two other predictor variables that are of potential interest. First, we wanted to know whether potential results would be robust across the two kinds of experiments, and we therefore pooled the data from the naming and lexicaldecision experiments, and included Experiment as a predictor, with 'lexical decision' and 'naming' as factor levels. Furthermore, we tested for potential effects of Stratum ('Latinate' vs. 'Germanic') in order to rule out the possibility that the curvature in the graphs of Figure 8 is due exclusively to Latinate affixes simply having higher processing costs than Germanic affixes, due to, for instance, more complex consonant clusters or later age of acquisition. The suffixes age, ary, ette, ian, ism, ist, ive, ment, or, ous were classified as Latinate, and the suffixes dom, ee, en, er, ery, ess, fold, ful, hood, ish, less, ling, ly, ness, ship, ster, th as Germanic. The classification of affixes into one or the other stratum is sometimes controversial; see, for example, the discussion in Giegerich 1999. We have used the standard criteria as summarized in §2, that is, etymology, possible attachment to bound roots, phonological integration, and productivity, ending up with a classification similar, though not identical, to that of previous authors (e.g. Spencer 1991:79, Hay 2002). [End Page 136]
Based on these considerations, we fitted a linear mixedeffects model to the joint data from the naming and lexicaldecision experiments, with suffix as a random effect and mean COrank, Stratum, and Experiment as fixed effects. Table 6 summarizes the coefficients of the fixed effects in the resulting model.
Figure 8 visualizes the relation between median latency and COrank for word naming (left panel) and visual lexical decision (right panel). Each panel shows two inverseUshaped [End Page 137] regression curves. The upper curves represent the Latinate suffixes, the lower curves the Germanic suffixes.
In Fig. 8, Latinate suffixes are shown in uppercase letters. Among these, age is atypical in that its median latency is at least 100 ms shorter than the median latency of any other Latinate suffix. We have no explanation for why age shows this exceptional behavior. Since it exerted undue leverage in our statistical models, we removed it from our data set. The regression lines shown in Fig. 8 as well as Table 6 are all based on the data with age excluded, and illustrate the main patterns in our data.
The main effect of Stratum is as expected: words with Latinate suffixes elicited longer latencies. The interaction of Stratum by Experiment indicates that this effect was more pronounced for the naming latencies, probably due to increased phonological complexity leading to delays in articulation.
Finally, we observe a nonlinear effect of mean COrank that was the same across experiments and strata. In order to check that the effect of mean COrank is not confounded with the productivity measure P (with which it is correlated), nor with the measure quantifying the number of types parsed (which was predictive at the level of individual words), we included these predictors in additional models. We verified that in these models (which run the risk of overfitting the data due to too large a number of predictor variables given the number of data points) the mean COrank remained significant. Apparently, the joint properties of the suffixes, of the base words to which these suffixes attach, and of the resultant complex words all conspire to produce a pattern in which the words with suffixes characterized by EXTREME mean COrank have a processing advantage.
Recall that we formulated two contradictory hypotheses about the relation between mean COrank and processing costs. The formulation of Hay 2003, that outer affixes are more easily parsed, suggests a negative correlation between mean COrank and processing costs. We observe such a negative correlation, but only for the suffixes with the very largest values of mean COrank. Conversely, since retrieval from memory is relatively cheap compared to morphological parsing, one might expect that suffixes with smaller values of mean COrank have a processing advantage. This expectation is borne out for the majority of suffixes. In other words, both predictions are correct, but hold for different ranges of the complexityordering scale. Apparently, two opposing forces are at issue, one favoring rote and the other favoring rule. Suffixes with a low mean COrank enjoy the advantages of storage. Suffixes with a high mean COrank enjoy the advantages of efficient parsing. Storage has its own disadvantages. Although human memory capacity is very large (Landauer 1986), the advantages of compositionality would be lost in a system that depended exclusively on memory retrieval. But parsing is a complex operation that also comes with its own disadvantages. As we move to the left in the complexity ordering, the complexity of this operation increases due to phonotactic and frequential properties becoming increasingly similar to those of simplex words.
We can make the opposing forces of rule and rote more explicit with the help of a mathematical model that is related but not identical to the mixedeffects model reported in Table 6. This regression model approximated the inverseUshaped relation between mean COrank and median byaffix latency in Fig. 8 by means of a quadratic polynomial. Expressed in the standard mathematical form of an intercept, a linear term, and a quadratic term, with x representing mean COrank, we have for median latency y the value in 4. [End Page 138]
There are many other functional forms that might be considered and that might provide a good fit to the data. In order to highlight the opposing forces of memory and computation, we consider a function that is very similar to the polynomial in 4, with two changes. First, we constrain the coefficients b and c to have the same absolute value. Second, we rescale the predictor x to an interval in (0, 1) representing the balance of storage and computation. At x = 0.5, storage and computation are fully balanced. An equation is given in 5 for median latency y as a function of the storagecomputation parameter x'.
This equation highlights that increasing x' to favor computation (1  x' smaller, so shorter latencies y) goes hand in hand with a commensurate storage penalty (x' greater, so larger latencies y), and that decreasing x' to favor memorydriven processing likewise is accompanied by a cost in parsing efficiency. For x' = 0.5, where the two forces are balanced, processing costs are maximal.
In order to fit the model in 5 to the data, we transformed the ranks x as given in Table 3 to values x' in a subinterval (r_{min}, r_{max}) in (0, 1). This subinterval was determined by means of a grid search that optimized the R^{2} for a linear mixedeffects model with Experiment, Stratum, the interaction of Experiment and Stratum, and (1  x') * x' as predictors for complexitybased ordering. The minimum value (r_{min}) for the transformed rank was estimated at 0.2889, and the maximum value (r_{max}) at 0.5556. The estimate of the coefficient c' for the transformed COrank was 1.83 (p < 0.002 across 1,000 Markov chain Monte Carlo samples of the posterior distribution of the parameters). The log likelihood of this new model (108.6) was slightly higher than the log likelihood of the model with the quadratic polynomial (94.9). We note, however, that this small advantage in the log likelihood comes with the cost of an extra parameter: instead of a linear and a quadratic term for x, we now have a single parameter c' for the term x' (1  x'), but r_{min} and r_{max} add two further parameters. For the present purposes, we regard the original model and the new model with the transformed ranks as equivalent. We call attention to the model with transformed ranks because this model facilitates understanding the pattern of results in terms of a balance between storage and computation.
Figure 9 shows the fit of the model in 5 to the median naming latencies, which are shown on the vertical axis. (The pattern for the lexicaldecision latencies is very similarrecall Fig. 8and is not repeated here.) The original mean COranks are shown at the top of the graph; the transformed ranks are shown at the bottom. It is easy to see that the maximum naming latency is reached when the balance parameter x' equals 0.5 (highlighted by the gray vertical line). When the balance parameter is increased beyond 0.5, favoring computation, the more productive suffixes receive a processing advantage. When it is decreased, favoring storage, the less productive suffixes receive a processing advantage. The asymmetrical location of the maximum is noteworthy: overall, memorybased processing apparently offers greater advantages than decompositiondriven processing.
Two final remarks are in order. First, the nonlinearity visualized in Fig. 9 depends crucially on the adjacency matrix being as complete as possible. When using only the data from CELEX and the BNC, the nonlinearity is no longer significant, and the relation between mean COrank and average latency is strictly linear, with positive slope. Apparently, we need to take into account the full range of possibilities offered by [End Page 139] English morphology in order to be able to discern themodestprocessing advantage offered by parsing.
Second, we note that for the full data set (see Fig. 1), base valency (the number of nonzero entries in a suffix row in the adjacency matrix, that is, the number of suffixes that can follow it) and derived valency (the number of nonzero entries in its column, that is, the number of suffixes that it can follow) add up to a constant, modulo byobservation noise. This follows from an ordinary leastsquares regression model in which base valency is regressed on derived valency, as in 6 (F(1,29) = 4.59, p = 0.041).
(6) base valency + 0.38 * derived valency = 7.17
In other words, a consequence of complexitybased ordering, given an adjacency matrix that is not too sparse, is that less productive suffixes are more productive as input for further word formation. This result was previously obtained, albeit in a completely different way, in Krott et al. 1999.
From this perspective, the claim advanced by Hay (2002:52728) that AN AFFIX THAT CAN BE EASILY PARSED OUT SHOULD NOT OCCUR INSIDE AN AFFIX THAT CANNOT can be made more precise. A priori, it is unclear why a more parsable suffix should not be free to occur first or second. Any difficulty in parsing has to be met at some point in [End Page 140] time, and why not earlier than later? To the extent of our knowledge, there are no proposals for syntax that it would be advantageous for processing if parsing difficulties are encountered earlier rather than later in the sentence. The crucial point is that morphological complexity is like a coral reef, with upper layers building upon preexisting lower layers, which depend more and more on memory, and for which parsing becomes more and more costly.
6. General Discussion.
Hay & Plag 2004 addressed the hypothesis of complexitybased ordering developed in Hay 2002, 2003. It showed, for a sample of fifteen suffixes, that a partial ordering of these suffixes could be established such that the rank of a suffix in this partial ordering correlated with its productivity, as gauged by measures assessing processing complexity. It also showed that selectional restrictions were in accordance with the complexitybased ranking, and tightened this ranking by providing further constraints on affix ordering.
The present study extends this research in several ways. First, we have broadened the empirical basis for English by extending the ordering to thirtyone suffixes. This extended set of suffixes showed the same kind of hierarchical behavior, with the ranks in the hierarchy correlating with the respective suffixes' productivity P.
Second, mixedeffects models fitted to wordnaming latencies and visual lexicaldecision reaction times documented significant effects for suffixrelated distributional measures, including a productivity measure introduced in Hay & Baayen 2002, the number of types above the parsing line. At the level of the individual word and its processing latency, however, complexitybased rank failed to reach significance. This suggests that it is unlikely that complexitybased ordering should be interpreted as a causal factor in lexical processing, that is, in the processing of individual words.
Third, however, we were able to show for both word naming and visual lexical decision that at the aggregate level of bysuffix median processing costs, the mean COrank is predictive. Interestingly, suffixes with extreme ranks in the complexityordering hierarchy enjoy the lower processing costs. We have shown how the inverseUshaped effect of mean COrank can be understood as resulting from the opposing forces of storage and computation. As we move down the hierarchy to the lowest values of mean COrank, complex words become more like simplex words, and retrieval from memory becomes less and less costly. As we move up the hierarchy, junctural and frequential properties allow parsing to operate more effectively, and for the suffixes with extreme ranks, the advantages of parsing become visible in the form of a negative relation between mean COrank and median processing latency.
The intermediately ranked suffixes have the greatest processing costs. They do not enjoy the full advantages of parsing, nor the full advantages of storage. This situation may have further adverse effects. Suppose, for ease of exposition, that the two access routes, rule and rote, are roughly equally fast for intermediately ranked suffixes. Given that derived words often carry idiosyncratic shades of meaning, and that by definition rulebased processing will lead to fully compositional meanings, intermediately ranked suffixes are more likely to suffer from the semantic ambiguity caused by the simultaneous availability of the opaque and transparent readings (cf. Schreuder et al. 2003). This might lead to delays in lexical processing. Another complication may be the wellknown tension between phonological and morphological optimality. In the spirit of Burzio 2002, we note that the words with properties that make them eminently parsable, thanks to very low transitional bigram frequencies, may also be the words that are more difficult to pronounce due to the presence of more complex consonant clusters. In the light of these considerations, we believe that the balance model introduced in 5 represents multiple [End Page 141] constraints simultaneously: values of the balance parameter greater than 0.5 bear witness to enhanced constituentdriven processing and greater semantic transparency; values less than 0.5 testify to lessmarked phonological structure, increased semantic idiosyncracy, and greater accessibility in lexical memory for phonological sequences.
We have seen that there are more suffixes with a storagecomputation balance parameter less than 0.5 than suffixes with a balance greater than 0.5. This suggests that storage in memory has an a priori advantage over decompositional computation. This may help explain the general trend for affixes to gravitate toward the unproductive over time. Combined with some principle of least effort (Zipf 1949), we can predict that processing load tends to be reduced by simply remembering complex derived words rather than reconstructing them from first principles (using wordformation rules, analogy, or other conceivable mechanisms).
We also note that the asymmetrical location of the maximum of the storagecomputation curve in Fig. 8 does not come as a surprise in exemplarbased approaches to morphological processing (Bybee 2001, Baayen 2007). Morphological rules, when viewed as generalizations (in lexical memory) over exemplars (in lexical memory), exist thanks to the availability of storage, so it is only natural that storage should have a head start visàvis computation.
One question raised earlier in this article (see §5.1) is whether our results, which were based on comprehension data, generalize to speech production. In Levelt's model (Levelt et al. 1999), which is still the most comprehensive and most widely accepted model of speech production to date, speech production is fully decompositional in nature, and wholeword access to morphologically complex words is ruled out a priori, and productivity does not play a role. Viewed from this perspective, storagecomputation balance is a nonissue. We have to note, however, that Hay's arguments for graded decomposability and complexitybased ordering are based in part on evidence obtained from the acoustic analysis of what speakers actually say. For example, Hay (2001) shows that the stemfinal [t] in words such as swiftly (decomposition bias) and softly (wholeword bias) is more likely to be present in the acoustic signal when the frequency of the derived form is relatively small compared to the frequency of the stem. In other words, constituentdriven processing favors the presence of [t], whereas words that are more independent of their constituents are more likely to be simplified phonologically (see also our discussion above about words like government). Facts such as these show that there must be a tradeoff of storage and computation in speech production as well.
Given the present results, it is interesting to revisit the status of the key exceptions to the complexityordering hierarchy. In §3 we raised the problem of how to deal with suffix pairs that occur in both orders, as witnessed by pairs such as nessless (weaknessless) and lessness (aimlessness). Interestingly, the exceptional order nessless is attested not only for sequences of two suffixes, but also for sequences of three suffixes, as shown in the examples in 7.^{10} [End Page 142]
(7)
a. Thursday, May 1, 2008. Eventfulnessless. Guys, I was serious about my running out of ideas. If you wish to see a post up everyday, as I know many of you do and I thank you for your continued interest in my writing, SUBMIT IDEAS SO I HAVE STUFF TO WRITE ABOUT!
(http://littlestsun.blogspot.com/2008/05/eventfulnessless.html)
b. JSTOR: A Study of Svatantrika . . . tantric influence and repeatedly introduces tantric concepts such as primordial mind (semsgnasma), mindfulnessless (dranpa medpa), and so forth.
The base words for these less formations have zero frequency in CELEX. This goes against the hypothesis of complexitybased ordering, since under this hypothesis the base word would have to be more dependent on memory and in that sense less easily parsable. These formations constitute genuine counterexamples to approaches to complexity based ordering that do not allow leakage.
Similar conclusions can be reached by looking at the exceptional formations with the suffix pair lyAJish (cf. Fig. 4).
(8)
a. Join us at our monthlyish visit to the cinema! We're off to see Enchanted TWO weeks before it's official release! It's this Sunday, 2nd Dec . . .
(http://eventful.com/events/enchantedatthebarnetodeon/E00010073599464)
b. I'mA Poet, Do You Know It? Upright and righteous, godlyish people, We worshipped God in a place with a steeple. Humble and gentle, our motto was honesty, . . . (http://myhome.spu.edu/lydia/poetry/)
c. I have a request . . .Zelda Universe Forums He is wearing a knightlyish brown thick tunic with patches here and there. His expression is slightly happy, but serious. He is armed with a sword with a . . .
(http://www.zeldauniverse.net/forums/artwork/49313ihaverequest.html)
The phonotactics of the two suffixes predicts that the sequence with the consonantinitial suffix, that is, BASEly, will generally be more parsable than the sequence with the phonologically more integrated vowelinitial suffix ish (cf. Hay 2003). The sequence lyAJish is therefore nonoptimal not only in terms of complexitybased ordering, but also in terms of phonotactics. Nevertheless, the nonoptimal order is admitted by the selectional restrictions and it is in use.
A final and as yet unmentioned kind of exception involves repetition of the same suffix, which introduces the smallest possible cycle into the suffixal directed graph. In general, we disregarded combinations of the same suffix in the above because it has been claimed (e.g. in Hay & Plag 2004, n. 3) that such formations are normally semantically illformed. There is, however, at least one suffix for which this claim does not hold: ish. The examples in 9 illustrate this.
(9)
a. What's up with Stephan Bonnar? I heard some random podcast where some guy (sorry Ican't remeber who but had a jewishish east coast accent) said something about bonnar treating h is . . .
(http://ninjashoes.net/forum/showthread.php?p=233454)
b. Paradise Place It's played with mischief, with fun but this is good fun, Theodosii has a boyishishness about him, maybe he's reliving an old memory, but now not as the boy. . .
(http://www.paradiseplaceproductions.com/webpages/spassov_event.htm)
For ish, the reduplication is fully in line with its selectional restrictions, and the second example shows in addition how the reduplicated suffix serves as input to further suffixation with ness. [End Page 143]
What all of these examples show is that although the graph of English suffix sequences tends to a surprising degree toward acyclicity, there are various wellformed exceptions. This raises the question of how to understand the nature of these cycles from a theoretical point of view. Do the cycles arise from stochastic fluctuations in the rankings of suffixes on the hierarchy so that, for example, ness is sometimes ranked higher and sometimes lower than less, allowing both suffix orders to be realized? That is unlikely because then we would expect there to be many more cycles in our graphs, since any two affixes should undergo similar random reorderings, contrary to fact.
As an alternative, one could think that the cycles emerge as the inevitable consequence of a fundamental property of human language, namely having recursive structures. Although potentially interesting from a theoretical point of view, this does not seem to be a convincing explanation. We first have to distinguish between cyclicity and recursion. While an upwardpointing arrow is a sign of cyclicity, recursion does require a full loop back to the first suffix in a single formation. In English morphology it seems that the potential for recursion is there, but it is hardly ever realized. Examples from the web show that truly recursive formations do exist but seem to be carefully crafted and are increasingly hard to process.^{11} Consider the quotation in 10 from an email signature.
(10) We must be fearless
We must have fearlessness
We must not be fearlessnessless
We must not have fearlessnesslessness
We must be fearlessnesslessnessless
This example shows that, like in syntactic processing, multiple recursion quickly leads to prohibitive comprehension difficulties. We therefore judge it to be unlikely that the cycles in our graphs exist specifically to subserve recursion. Instead, we submit that the cycles exist simply because the selectional restrictions of the suffixes involved fall out in this way.
Finally, it is noteworthy that the evidence for the directed graph for English suffixes being nearly acyclic is much stronger than the evidence for a correlation of mean COrank with suffix productivity, which, although significant, is not extremely strong (R^{2} = 0.33). Likewise, the evidence for processing consequences of the mean COrank is much stronger than the evidence for its correlation with suffix productivity. This raises the question of what the results might be when even larger numbers of suffixes are taken into consideration. We expect the evidence for acyclicity to remain strong. It is conceivable, however, that further research will show that for larger numbers of suffixes the correlation between productivity and mean COrank breaks downrecall that in the study of Hay & Plag 2004 with only fifteen suffixes there was more support for correlational structure involving COrank and measures of constituentdriven processing. This possibility requires further reflection on the status of acyclicity as an independent processing principle in the grammar.
What other processing mechanism might drive the acyclicity that constrains the combinatorial possibilities of suffixes? An important aspect of human cognition in general, and of language processing in particular, is the ability to anticipate or plan for [End Page 144] upcoming constituents (e.g. Hawkins & Blakeslee 2004). When the listener has heard the first suffix of restlessness, and has deduced from the fine phonetic detail in the speech signal that another suffix is following (Kemps, Ernestus, et al. 2005, Kemps, Wurm, et al. 2005), conditional probabilities may come into play. In the case of restlessness, the conditional probability of ness given less, as given in 11, describes the likelihood that the listener will correctly anticipate the second suffix, given the first one.
(11) Pr(suffix_{2}suffix_{1})
In speech production, 11 captures the likelihood that the speaker will select the target suffix from the set of possible suffixes that might follow the first suffix.
We now note that this conditional probability can be much higher when the graph of suffix combinations is acyclic. In an acyclic graph, the number of suffixes that might potentially follow a given suffix decreases as one moves from lowerranked to higherranked suffixes. In the ordered adjacency matrix shown in Fig. 4, this is easily seen: as we move from th down to wise, the number of affixes above the diagonal decreases. In other words, given that we know that the rank of the first suffix of the combination is, for example, 10, we know that the largest possible set of suffixes that might follow is constrained to the twentyone suffixes to its right in the hierarchy. When the rank is 20, we know the continuation set contains only eleven candidates. The actual sets of possible continuation suffixes are, of course, further constrained by selectional restrictions, and contain only those suffixes that have a nonzero entry on the relevant rows in the ordered adjacency matrix (cf. Fig. 4). To obtain more precise estimates of the conditional probabilities as given in 11, we should not only take the number of attested affix combinations into account, but also weight these combinations for the number of different words instantiating these combinations, as well as the token frequencies of these words. Crucially, graphs that have very many cycles hardly constrain the set of possible continuation suffixes. In such graphs, it is impossible to restrict the set of suffixes that might follow to those that appear to the right in the hierarchy. In short, acyclic graphs afford enhanced anticipation (in comprehension) and enhanced planning (in production) compared to graphs with many cycles.
Interestingly, acyclicity may also link up with suffix productivity in a new way. Suffixes that occur to the left in the hierarchy (e.g. th) are suffixes that provide the least effective conditional probabilities. For such suffixes, there are many possible continuation suffixes, and hence the probabilities of the individual suffixes all remain small. For suffixes to the right in the hierarchy (e.g. ness), there are only few continuation suffixes, and as a consequence the conditional probabilities of these suffixes will tend to be much larger. In other words, the suffixes to the left in the hierarchy, the less productive suffixes, have a processing disadvantage compared to the suffixes to the right of the hierarchy, the more productive suffixes. This change in processing complexity as we move from the left to the right of the hierarchy is, by logical necessity, fully correlated with the rank in the hierarchy.
If this understanding of why acyclicity is favored is correct, we also have an explanation of why the correlation with the P measure of productivity is not particularly strong (R^{2} = 0.33). The rank in the hierarchy captures (via conditional probabilities) the likelihood of the transition of a given suffix to a second suffix, whereas the P measure estimates the overall likelihood of observing a new formation ending in the first suffix. The two likelihoods are correlated, but not identical. [End Page 145]
A different question that we have not yet touched upon is to what extent the present findings might generalize to other languages with more productive morphology, such as Russian or Italian. We expect that in such languages the balance of storage and computation would shift dramatically in favor of parsing, and that parsing constraints would play a minor role at best. Given that in such languages the token frequencies of individual word forms are much reduced compared to English (since a much larger set of forms compete for roughly the same usage space), the role of memory must be substantially reduced. Under these circumstances, COrank might well emerge as an independent principle uncorrelated with measures of constituentdriven processing.
Returning to English, a recurring comment that we have encountered is whether the very lowfrequency forms that we have discussed are at all comprehensible. We have been told by native speakers (and professional linguists) that words such as weaknessless or boyishishness are uninterpretable. These words are used by other native speakers, however, and they are potential words of English in the sense that they do not violate any selectional restrictions. Why then do some native speakers have problems with these forms? We believe these problems arise from two sources. First, they involve the repetition of identical phonological/phonetic material across morphological boundaries, which is generally avoided in English (Stemberger 1981, Plag 1998, Raffelsiefen 1999). This avoidance phenomenon is known as morphological haplology. Haplological restrictions are common in English derivational morphology and may also have an influence here. Second, unease with these formations may arise due to processing difficulties. Speakers of English are used to encountering complex words that they have seen before. For weaknessless or boyishishness, which they have not seen before, speakers of English have to fall back on parsing, which, as we have seen, plays a subordinate role in this language (as against languages with a rich morphology), and which is all the more difficult the more suffixes are to be processed.
Another issue that deserves mentioning is the effect of acoustic reduction on constituent driven processing. Forms that undergo reduction may become less parsable during speech comprehension, especially when the suffix itself is substantially reduced. Interestingly, reduction affects higherfrequency words (Jurafsky et al. 2001, Bell et al. 2003) and affixes in higherfrequency words (Pluymaekers et al. 2005) most strongly, and may therefore be expected to be most pronounced for words with less productive affixes (Keune et al. 2005). In other words, reduction processes may foster entrenchment in memory notably at the lower end of the productivity spectrum, and at the same time render parsing processes more difficult. As a consequence, the present results, which are based on the written record and on processing measures involving reading skills, may underestimate the role of memory in auditory lexical processing and speech production.
To summarize, our article offers three key findings. First, suffixes can be ordered along a hierarchy according to which suffixes that are closer to the stem tend very strongly not to occur outside suffixes that occur not so close to the stem. This hierarchy was shown to correlate with the productivity of the suffixes. The findings of Hay & Plag 2004 stand the test of doubling the number of suffixes.
Second, behavioral data from lexical decision and word naming have shown that constituentdriven processing is not necessarily the most timeefficient way of processing. In particular, constituentdriven processing (as gauged in terms of rank in the hierarchy) does not stand in a linear relationship with processing costs. Rather, suffixes with extreme ranks in the complexityordering hierarchy have lower processing costs than suffixes of medium rank. This can be interpreted as an effect of the opposing [End Page 146] forces of storage and computation, with lowranked suffixes profiting from wholeword storage and highranked suffixes from segmentation. This means that the model of complexitybased ordering has been onesided in its exclusive focus on the importance of constituentdriven processing and that it requires supplementation by a second and equally important focus on the role of memory.
Third, we have shown that the hierarchy of complexitybased ordering implies that the graph of suffix combinations is acyclic, and that the empirically observable nearacyclicity is very unlikely to be due to chance. Our hypothesis is that this acyclicity is functional for lexical processing since it allows for better estimates of transitional likelihoods from the first to the second suffix. These transitional likelihoods also link up with increasing productivity along the hierarchy. Future research has to investigate whether this new aspect of morphological productivity can be substantiated by behavioral and electrophysiological measures of lexical processing.
Universität Siegen
English Linguistics, Fachbereich 3
AdolfReichweinStr. 2
D57068 Siegen, Germany
[plag@anglistik.unisiegen.de]
University of Alberta
Department of Linguistics
426 Assiniboia Hall
Edmonton, T6G 2E5, Canada
[baayen@ualberta.ca]
Appendix
Examples of attested English derived words with twosuffix combinations (from the OED, if not indicated otherwise).
Suffixes studied in Hay & Plag 2004: lengthen, depthless, flattener (internet), flattenee (internet), preacherling, breweress, loverly, printerdom, loverhood, controllership, robberish, leaderless, tumblerful (BNC), saplinghood, ducklingship, seedlingless, refugeeess (internet), employeehood, assigneeship, princessly (internet), princessdom, priestesshood, governessship, governessless, knightlyhood, woollyish, kingdomless, kingdomful, courtliness, childhoodless (internet), censorshipless (internet), kinshipful, amateurishness, aimlessness, carefulness
Suffixes added in this study: meteorette, protectorian, factorage, detectorist, traitorous, actorism, scissorwise, rhymsterette, spinsterian, hucksterage, spinsterous, hucksterism, activist, activism, parliamentary (adjective), parliamentary (noun), developmentist, garmentry (BNC), medicamentous, sacramentism, pedimentwise, librarian, voluntaryist, apothecariry, contrarious, secretaryism, contrariwise, briquettage, novelettist, suffragettism, physicianary, guardianage, Europeanist, sylvanry, ruffianous, Europeanism, Christianwise, manifoldwise, militaryment, artillerist, voluntaryism, plumagery, umbrageous, percentagewise, dentistry, effronterist, mysterious, adaptively, adventureously, complementarily
Suffixes of Hay & Plag 2004 combined with suffixes added in the present study: depthwise, lengthenment, sleeperette, farmerage, consumerist, loverwise, boxerism, grovellingwise, absenteeism, huntresswise, beastlywise, orderlyism, foolishment, Irishian, Yiddishist, Scottishry, foolishwise, Britishism, carelesswise, despiteful wise, courtshipment, businesswise (BNC), neighbourhoodism (BNC), despitefully, babyishly, blushlessly
Suffixes added in the present study combined with suffixes of Hay & Plag 2004: tenfoldness, professorling, editoress, sailorly, protectordom, traitorhood, administratorship, spectatorish, connectorless, traitorful, spinsterly, gangsterdom, spinsterhood, hucksteress, tapstership, spinsterish, addictiveness, governmentship, documenter, experimentee, secretaryhood, justiciaryship, dictionaryless, noveletteish, musicianer, barbarianess, guardianly, Christiandom, ruffianhood, musicianship, christianish, guardianless, victorianness, villagedom, villagehood, villageship, carriageless, foolageness, carriageful, elementariness, alchemister, artistess, artistly, artistdom, evangelistship, nurserydom, ministryship, otherwiseness
References
Acknowledgment
The order of the authors is in reverse alphabetical order. The authors are indebted to Jen Hay, the anonymous referees, and the editors for their constructive criticism of earlier versions of this article, as well as to the audiences at colloquia and workshops in Nijmegen, Mainz, Paris, Provo, and Vienna for stimulating discussion.
Footnotes
1. If defined over roots, this restriction even holds for the parasynthetic formations in which en occurs, for example, enlighten, embolden.
2. The status of business as a derivative of the suffix ness may be questionable, since the form is semantically and phonologically opaque. The suffix can still be phonologically discerned, however, and the word has some properties that are typical of ness formations, that is, it is an abstract noun. As discussed in the previous section, the strength of the morphological boundary between a given affix and the different stems it attaches to can vary a great deal and is, among other things, dependent on the frequencies of base and derivative. Needless to say, business has a strong wholeword bias, which, as in the case of government, goes together with its morphological opacity.
3. Note that we have not attempted to maintain the ranking of Fig. 4 for Fig. 5, which was obtained by independent application of the algorithm for minimizing the number of exceptions described above.
4. The coefficient for a numeric predictor specifies the unit increase (or decrease, if negative) in the (logtransformed) latency corresponding to a unit increase in the value of that predictor, when all other predictors in the model are held constant.
5. Quadratic terms were included in the lexicaldecision and naminglatency models whenever they improved the fit of the models significantly. Likelihood ratio tests showed significant differences between models with the quadratic terms as against ones without these terms (base frequency quadratic term: X^{2}_{(1)} = 4.293, p = 0.0383 for lexical decision, X^{2}_{(1)} = 8.417, p = 0.0037 for the naming latency; derived frequency: X^{2}_{(1)} = 4.035, p = 0.0446 for lexical decision; length: X^{2}_{(1)} = 13.75, p = 0.0002 for lexical decision). It is only the quadratic term of derived frequency in naming latency that receives less support from the likelihood ratio test (X^{2}_{(1)} = 3.036, p = 0.0815). In this particular case we decided to nevertheless leave the term in the model, due to its being both highly significant in the model and of prime theoretical interest. We add the likelihood statistics for completeness only and refer the reader to Pinheiro & Bates 2000:Ch. 2 for critical discussion of the use of likelihood ratio tests for comparing models with different fixed effects. These authors recommend the use of conditional tests as documented in our Tables 4 and 5.
6. See Balota et al. 2004 and Baayen et al. 2006 for nonlinear functions relating word frequency and response latencies for monomorphemic monosyllabic words, and Harrell 2001 for indepth discussion of the modeling of nonlinear functions in multiple regression.
7. For details on random effects in mixedeffects models, the reader is referred to Pinheiro & Bates 2000, Faraway 2006, and for nontechnical introductions to Baayen 2008 and Baayen, Davidson, & Bates 2008.
8. In order to avoid spurious correlations in the randomeffects structure of the model, derived frequency, base frequency, and length were centered; see, for example, Pinheiro & Bates 2000:3037 and Baayen 2008: 25456.
9. More formally, note that relative frequency enters into a regression model as a single term that receives a single coefficient. Since frequency effects are logarithmic in nature, it makes sense to consider relative frequency on a logarithmic scale as well, that is, as a log odds ratio. Since this log odds ratio is modeled with a single coefficient, say β, and denoting the two frequencies by ƒ_{1} and ƒ_{2} respectively, we have that β log ƒ_{1}/ƒ_{2} = β log ƒ_{1}  β log ƒ_{2}. Note that in this equation the two frequencies receive exactly the same beta weight, and that both are conceptualized as linear effects. This contrasts with the empirical data in which the two frequencies have very different weights and may even differ in their functional form.
10. We also found examples of adjectival ly preceding the combination nessless.

i. The eternal knock against Quitely has been his complete timelinessless; monthly books become biennial with this man. DC's addressed this issue by making ASS...
(http://mesmerizationeclipse.blogspot.com/2007/01/thismonthinsupermen.html)

ii. I can bring home the bacon...I dunno, smellinessless might be a good power if the people you're up against have acute sense of smell and therefore your lack of stink would mean they...
Interestingly, these all involved cases in which the final suffix less creates words that are used as nouns, instead of adjectives. Although this peculiarity may well have to do with processing difficulties (whereby the suffix may have lost its transpositional power), we do not have a real explanation to offer for this phenomenon.
11. This can also be seen with prefixes like great, as in greatgreatgreatgreatgrandfather, where one quickly loses track of the number of generations, or with formations such as antiantimissilemissilemissile. (Thanks to Brian Joseph for these examples.)