In lieu of an abstract, here is a brief excerpt of the content:

Reversing a One-Way Bilingual Dictionary* Leonard Newmark "One day we will go back to Kosov[a] . That's our land." — Ramada Shaqiri, 30 March 1999 Afi fter completing a ten-year project to write an Albanian dicLtionary , published in March 1998 by Oxford University Press as the Oxford Albanian-English Dictionary and designed specifically for users who want to read Albanian and whose access language is English , I decided to prepare a companion English-Albanian dictionary for users who want to write Albanian, but I did not want to devote another ten years to that compilation. I wondered whether I could produce a useful bilingual dictionary with the reverse orientation by automatic conversion of the entries in the data files from which the first dictionary was generated. This paper is a report on the degree to which the attempt succeeded and the degree to which human intervention was required . Examples are provided to illustrate some rather surprising results, and a general conclusion is drawn for bilingual lexicography. For languages of limited worldwide commercial importance, like Albanian, it seems particularly important to use computational techniques to derive new dictionaries from lexical data files compiled for some other purpose, especially if those files are extensive and have information otherwise difficult to come by in machine-readable form. The richness of the lexical data files from which my Albanian-English dictionary was generated is evidenced by that dictionary's 75,000 entries and subentries, more than are found in any other dictionary of Albanian. Those files already provide a number of features that dis- *This paper is a reworked and expanded version of the paper I presented at the 8th EURALEX International Congress (4-8 August 1998) and published in the proceedings of that congress. 38Leonard Newmark tinguished this dictionary from many other bilingual dictionaries: 1) inclusion of large numbers of nonstandard items (marked by asterisks ) as well as all attested standard stems; 2) marking of morpheme boundaries in Albanian words; 3) inclusion of some 16,700 phrasal expressions , in particular, phrasal names, collocations, idioms, and proverbs; 4) use of large numbers of bipartite definitions with a discursive description of the sense followed, after a colon, by English synonyms exemplifying that sense; 5) inclusion of large numbers of terms for grasses, flowers, birds, and fish with their scientific definitions; 6) inclusion of a modest amount of encyclopedic information to explain words whose strictly lexical meaning would not make their use in Albanian contexts intelligible; 7) listing of the various stem forms of lexemes as separate entries in their own alphabetical position to enable readers to decipher otherwise mystifying forms encountered in actual texts; 8) indication of the specific limits of variation that leave idiomatic senses intact (e.g., in phrasal expressions, marking a verb that can appear in any of its inflected forms by giving it in citation form with a symbol (·) at the end of the stem); 9) elaborate labeling of Albanian distinctions in domain and register; 10) rendition of phrasal expressions by stylistically similar English expressions, frequently supplemented by literal translations (enclosed in quotation marks) to enable more nuanced understanding. Each entry in the plain text, flat data files from which the Albanian-English dictionary was generated is a line consisting of an Albanian word or phrase followed by a definition in English (or by a cross-reference to another line) . Each line is embedded with simple visible two-letter formatting codes (e.g., HW [headword], DF [definition], TK [technical name] , CO [collocation] ) immediately preceded by a period (.) and immediately followed by what will get the formatting assigned by that code. The easily redefinable codes are later translated by a set of UNIX scripts into formatting instructions in TeX, which can go directly to a printer or indirectly by translation into Post-Script files. The simplicity of such transparent and flexible coding for entering the data, in contrast with elaborate schemes requiring complex coding by experts into predefined structures,1 was initially dictated by limits typical of languages that attract little commercial interest and 'For example, those used in the architecture described by Willy Martin and Anne Tamm in "OMBI: An editor for constructing reversible lexical databases ," EURALEX '96 Proceedings...

pdf

Share