In lieu of an abstract, here is a brief excerpt of the content:

Reviewed by:
  • Word sense disambiguation: The case for combinations of knowledge sources by Mark Stevenson
  • Cornelia Tschichold
Word sense disambiguation: The case for combinations of knowledge sources. By Mark Stevenson. (CSLI studies in computational linguistics.) Stanford: CSLI Publications, 2003. Pp. 175. ISBN 1575863901. $25.

Disambiguating words is easy for human beings, but difficult for computers. Computational linguistics has developed methods to reliably find the correct part of speech for the large majority of words in running text, but the disambiguation of polysemous words and homonyms (bat as animal, sports tool, or blink of the eye) is a more complex process. This difference is due mainly to the lack of sufficiently complete and formalized data about word senses. Stevenson shows how progress can be achieved by reusing existing lexical databases and combining them in an optimal way.

Ch. 1 introduces the problem of polysemy and points out the potential areas of application for word sense disambiguation (WSD). Ch. 2 gives some historical background on the area, intended for readers unfamiliar with the field. In Ch. 3, lexicographic problems associated with polysemous words and attempts at arriving at suitable databases (such as Word-Net) are discussed. As it does not seem likely that machines can take over any significant part of the lexicographic work involved in the production of semantic databases, the re-use of machine-readable dictionaries appears to be the only viable solution for the immediate future. S refutes a number of criticisms that have been made against the use of a machine-readable dictionary for WSD, mainly due to their lack of alternatives, and proposes methods for at least partially remedying the known shortcomings.

Ch. 4 describes the knowledge sources that can be used for WSD, that is, syntactic, semantic, and pragmatic information, and the conditions needed to combine them. WordNet and the Longman dictionary of contemporary English (LDOCE) are identified as two potentially useful on-line lexicographic databases. In Ch. 5, the computational similarities and differences of part-of-speech tagging and WSD are explained.

In Ch. 6, S explains how his system combining the various knowledge sources was implemented: the preprocessing stage filters out proper names, tokenizes the input text, and identifies the part of speech for each word (using a Brill-type tagger). This is followed by a shallow syntactic analysis and finally the lexical look-up stage. At the disambiguation stage, the part-of-speech tags are used to filter out any (syntactically) incompatible senses, before a number of partial (semantic) taggers are brought into play. The first of these uses LDOCE senses, with any subsenses grouped where possible; the second uses categories of synonyms, and the last selectional restrictions. Known collocations are also taken into account. The implementation involved a memory-based machine learning system that was first trained on annotated data and then used to combine all the knowledge sources for WSD.

Chs. 7 and 8 deal with evaluation of the author’s and other known systems for WSD. S illustrates the unsatisfactory state of evaluation tools and procedures in the area of WSD, before demonstrating that his system achieves better results thanks to the combination of a number of available lexical resources. [End Page 1022]

The book is a readable introduction and description of the problems WSD poses for computational linguistics, making a clear case for a hybrid approach that uses knowledge-based and corpus-based sources of information to identify the sense of ambiguous words.

Cornelia Tschichold
University of Wales Swansea, Great Britain
...

pdf

Share