In lieu of an abstract, here is a brief excerpt of the content:

Reviewed by:
  • Beyond grammar: An experience-based theory of language by Rens Bod
  • Natalie Sciarini-Gourianova
Beyond grammar: An experience-based theory of language. By Rens Bod. Stanford: CSLI Publications, 1998. Pp. 168. Paper $19.25.

Written by a researcher and lecturer in computational linguistics, this book offers an outlook within a new approach to language processing known as ‘Data Oriented Parsing’ (DOP). This approach embodies the assumption that human language comprehension and production works with representations of concrete past language experiences rather than with abstract grammatical rules.

The main idea of the book can be compressed in two short statements. First, a language user tends to produce the most probable utterance for a given meaning on the basis of frequencies of previous utterance representations. Second, a comprehender tends to perceive the most probable analysis of a new utterance on the basis of frequencies of previously perceived utterance analyses. Such an account of the processes in natural language production and comprehension is turning formal language theory into a new domain, making the predominant notion of universal grammar obsolete. Well, why not? A similar account has by now found its way into translation theory and psychology, giving us a reason to look for unfolding linguistic competence through measures of language comprehension. As for grammarians, they still prefer to stick to good old transformational grammar, and the bibliography is immense. That is why the importance of the new approach can hardly be denied. It might give an explanation for a phenomenon that cannot be understood in the frames of universal grammar. Why does any natural language store not only grammatically correct sentences but also irregularities and ‘exceptions’? May it be so that they survive because a statistical ensemble of language experiences changes slightly every time a new utterance is perceived or produced?

Those who have read the book already might agree that this is a provocative question. It is still to be answered in further research—research that undoubtedly can be inspired by this book. Bod is quite right in his belief that there is still a cultural gap to be bridged between natural language technology and theory. He argues that there is no such thing as statistical linguistics without a theory of linguistic representation, and there can be no adequate linguistic theory without a statistical enrichment. This was the idea to be proved: indeed, it is proved, and its theoretical foundations are well-supported with passionate, convincing, and comprehensive arguments based on experimental data.

In the first few chapters B motivates a probabilistic approach to linguistic characterization from both a psychological and technological point of view. His DOP model produces and analyzes new utterances by combining subtrees from a corpus and uses the frequencies of structures in a corpus of experienced utterances-analyses to predict the analyses for new utterances. In this respect, the use of probability theory seems to be an actual step forward since it actually models the notion of frequency in occurrence in a mathematically precise way. B has proposed an objective method for evaluating DOP models—the blind testing method combined with an exact match accuracy metrics. This method dictates that a manually analyzed language corpus be randomly divided into a training set and a test set. The degree to which the most probable analyses generated by the system match with the test set analyses is a measure for the accuracy of the system. The reason for choosing these ways of evaluation was the need to exclude the influence of experimental outcomes on the appropriateness judgments; statistical approach provides for a best guess in case of uncertainty in this case.

Further chapters deal with a DOP model based on simple phrase-structure trees. The probabilities of different analyses are estimated on the basis of the occurrence frequencies of the subtrees involved in their derivations. In the course of discussion, B shows how this model compares with other probabilistic language models in the context of a formal stochastic language theory. B deals with the problem of computing the most probable parse of a sentence and reports on a series of experiments in which several strategies for restricting the set of corpus subtrees are investigated. Having shown that an...

pdf

Share