In lieu of an abstract, here is a brief excerpt of the content:

Categorizing Dictionary Information for a Lexical Database of Proper Names1 Muhammad Asadur Rahman University of West Georgia Martha Evens Illinois Institute of Technology Introduction mis paper describes the design and development of a lexical database containing about 55,000 records of information about proper nouns and their semantic relationships and odier features obtained mainly from the machine-readable version of the Collins English Dictionary. This database is designed to be used by parsing and text generation programs, especially question-answering systems . Our focus is on the classification of these entries. Since entries of different types contain very different kinds of information, these categories are fundamental to the design of the lexical database and the information extraction parser that analyzes the dictionary entries and builds the database records as it goes. This classification also helps the parsers and question-answering systems using the database interpret 'The authors would like to thank Collins Publishers and Patrick Hanks for giving their permission to use the Collins English Dictionary for research. Thanks also to the ACL Data Collection Initiative (DCI) for providing the researchers with die machine-readable version of the dictionary. We are also grateful for the valuable suggestions of an anonymous referee for this journal, who clearly invested much time and thought in this effort. Dictionaries:Journal ofthe Dictionary Soaety ofNorth America 27 (2006), 36-82 _______________Lexical Database of Proper Names______________37 the input text and the user questions and retrieve the relevant data. We include an evaluation of the database and a discussion of its limitations. Initial Motivation Texts cannot be understood without recognizing and understanding the proper names they contain. Each proper name is associated with some information that characterizes it. A computer needs this information in any kind of natural language application: text generation , text understanding, information retrieval, and especially question answering (Walker 1989). A lexical database can store this information in a suitable format for a computer. When we set out to build a lexical database in order to store proper name information, we chose the CollinsEnglishDictionary (hereafter , CED) because it had more information about proper names than any other machine-readable dictionary available at that point. We discovered that the information associated with different types of proper names varies tremendously, such that we needed different tables for different types of names. What is more, the question-answering systems that are the most immediate users of our database need relational information , and different types of proper nouns are involved in very different relationships (people have parents and professions, while countries have capital cities and languages spoken). Thus, categorization of proper names is a prerequisite to both database construction and question answering. Consider, by way of illustration, the following newswire text from the Associated Press: Tokyo close sharply higher, dollar lower By Associated Press, 4/22/2002 03:25 TOKYO (AP) Tokyo stocks rose sharply Monday, supported by Friday's advance on Wall Street. The dollar was lower against the yen. Meanwhile, Japanese Finance Minister Masajuro Shiokawa said in New York that the yen is likely to stay strong over the dollar. Shiokawa was among the finance ministers from the Group of Seven who attended a meeting in Washington . The G-7 includes the United States, Japan, Britain, France, Germany, Italy, and Canada. About 30% of the words in the above news article are proper nouns or their derivatives. Table 1 displays a list of the proper nouns in this article: 38 Muhammad Asadur Rahman and Martha Evens Table 1 List of Proper Names in the Newswire Article Tokyo Monday Friday Wall Street Japanese Finance Minister Masajuro Shiokawa New York Group ofSeven name of a city (capital) time related (day) time related (day) stock market (NYSE) a group of people (n, adj) a tide person name name of a city an economic forum Washington G-7 United States Japan Britain France Germany Italy Canada name of a city (capital) an acronym country name country name country name country name country name country name country name Note that several of the items listed in Table 1 are phrases. Proper name phrases are sequences of proper names with or without interleaved conjunctions, prepositions, or articles. They comprise a significant portion of written...

pdf

Share