Morphological Organization: The Low Conditional Entropy Conjecture


Crosslinguistically, inflectional morphology exhibits a spectacular range of complexity in both the structure of individual words and the organization of systems that words participate in. We distinguish two dimensions in the analysis of morphological complexity. enumerative complexity (E-complexity) reflects the number of morphosyntactic distinctions that languages make and the strategies employed to encode them, concerning either the internal composition of words or the arrangement of classes of words into inflection classes. This, we argue, is constrained by integrative complexity (I-complexity). The I-complexity of an inflectional system reflects the difficulty that a paradigmatic system poses for language users (rather than lexicographers) in information-theoretic terms. This becomes clear by distinguishing average paradigm entropy from average conditional entropy . The average entropy of a paradigm is the uncertainty in guessing the realization for a particular cell of the paradigm of a particular lexeme (given knowledge of the possible exponents). This gives one a measure of the complexity of a morphological system—systems with more exponents and more inflection classes will in general have higher average paradigm entropy—but it presupposes a problem that adult native speakers will never encounter. In order to know that a lexeme exists, the speaker must have heard at least one word form, so in the worst case a speaker will be faced with predicting a word form based on knowledge of one other word form of that lexeme. Thus, a better measure of morphological complexity is the average conditional entropy, the average uncertainty in guessing the realization of one randomly selected cell in the paradigm of a lexeme given the realization of one other randomly selected cell. This is the I-complexity of paradigm organization. Viewed from this information-theoretic perspective, languages that appear to differ greatly in their E-complexity—the number of exponents, inflectional classes, and principal parts—can actually be quite similar in terms of the challenge they pose for a language user who already knows how the system works. We adduce evidence for this hypothesis from three sources: a comparison between languages of varying degrees of E-complexity, a case study from the particularly challenging conjugational system of Chiquihuitlán Mazatec, and a Monte Carlo simulation modeling the encoding of morphosyntactic properties into formal expressions. The results of these analyses provide evidence for the crucial status of words and paradigms for understanding morphological organization.