In lieu of an abstract, here is a brief excerpt of the content:

  • The Complexity of Phonology
  • András Kornai

1 Introduction

Two decades after the introduction of complexity-theoretic methods for the study of natural language grammars, there are still many technically incorrect arguments purporting to show the intractability of various formalisms. Here I analyze two such arguments, one old and one recent, both aimed at demonstrating that specific versions of phonological theory, as formulated in computational and theoretical work, are incapable of characterizing human phonology inasmuch as they lead to NP-hard computational problems that are beyond the reach of any known fast algorithm.1 These demonstrations break down because they apply a mathematical technique, asymptotic analysis, whose major premises are not met. An alternative approach, Kolmogorov complexity, is proposed.

Barton (1986) studied the complexity of the analysis and generation problem of two-level phonology and morphology (TWOL) by means of encoding classical NP-hard problems (SAT and 3SAT) using the finite state transducer mechanism introduced independently in Koskenniemi 1983 and, with a somewhat different model, in Kaplan and Kay 1982 (eventually published 1994). He concluded that TWOL systems are incapable of characterizing the human morphological ability because humans clearly perform the analysis and generation tasks [End Page 701] in real time while the TWOL computational machinery must draw on exponentially growing resources as the problem size grows. Given that TWOL systems were at the time already the leading model of computational phonology and morphology, it is not surprising that the work drew the heated response if it ain’t broke, don’t fix it (Koskenniemi and Church 1988).

That such a response was not entirely unjustified is clear in hindsight. After two decades, TWOL remains the leading computational model of phonology and morphology (see, e.g., Karlsson 2006), with working phonological/morphological grammars for hundreds of languages. If TWOL had the propensity to bump up against the complexity limit set by NP-hardness, it would be nothing short of miraculous that we can still use this framework without much trouble. It helps that processing power nearly doubles every two years (Moore 1965), but the main effect of Moore’s Law was to reduce the hardware required from the high-end workstations of the eighties to any cheap laptop today; the computations themselves didn’t get notably more complex with increased lexical coverage and more detailed analyses.

Idsardi (2006a), building on earlier work (Ellison 1994, Eisner 1997, 2000), studied the complexity of Optimality Theory (OT) by means of encoding another classical NP-hard problem, directed Hamiltonian path search (DHPS), using the Gen/Eval mechanism of OT, and reached essentially similar conclusions about the suitability of OT for characterizing human phonology. Though Barton based his reduction proof on assimilation rules, while Idsardi uses dissimilation constraints to arrive at NP-hard problems, there is much that is common to these works, and here they are analyzed together, because they exhibit the same fundamental flaw: they use a mathematical method, asymptotic analysis, that relies in an essential fashion on the ability to create arbitrarily large problem instances. To some extent, Barton, Berwick, and Ristad (1987:sec. 1.4.1) anticipated this criticism:

Aren’t then the complexity results irrelevant because they apply only to problems with arbitrarily long sentences and arbitrarily large dictionaries, while natural languages all deal with finitesized problems?

It is comforting to see that this argument explodes on complexity theoretic grounds just as it does in introductory linguistics classes. . . . Our goal is to determine the form and content of linguistic knowledge. . . . Once we have identified the principles that seem to govern the construction of sentences of reasonable length, there doesn’t seem to be any natural bound on the operation of these principles. . . . If humans had more memory, greater lung capacity, and longer lifespan—so the standard response goes—then the apparent bound on the length of sentences would be removed.

As we will see, domain length is indeed one of the parameters that delimit the complexity of combinatorial problems that can be [End Page 702] encoded in phonology. But even if we grant arbitrary domain size, and even if we grant arbitrarily large dictionaries (a step that actually has considerable statistical support; see Kornai 2002), there are other limiting factors...

pdf

Share