In lieu of an abstract, here is a brief excerpt of the content:

137 1 2 PROBLEMS OF MULTI-LINGUALITY Genevieve Clavel-Merrin Introduction Libraries, especially those at the academic or national level, have traditionally held collections representing many languages and scripts. In most cases, however, access to this (mainly) printed material has been at the metadata level (bibliographic records) through single language indexes (subject or author, controlled vocabulary) which have enabled material published in different languages to be brought together. These vocabularies are naturally different from country to country, but even within one language zone vary according to library type or specialty: for example, while LCSH (Library of Congress Subject Headings) and MESH (Medical Subject Headings) have some terminology in common, they are to all intents and purposes two different languages, indeed those working in the field speak of subject heading languages. Interoperability or access to different collections in different languages is therefore already problematic on the metadata level; but the growth in digitized full-text material and networking or aggregating data presents new challenges. In addition, when planning for a multi-lingual digital library, different levels and interpretations of multi-linguality need to be taken into account: data management and display; interface; controlled vocabularies; full text. The scale of these challenges will vary according to context: at one extreme a large-scale pan-European initiative such as Europeana ( involves the management of multiple languages at both interface and searching, whereas in a multi-lingual country such as Switzerland the goal may be to manage the local languages (often plus English). Costing and business planning are therefore very variable, and scalability difficult. In addition, the field of cross-language access remains one of experimentation in which no standard solution is available on the market. In terms of planning, it must be recognized that the variety of cases possible and levels of complexity make it difficult to estimate costs and time required. At the same time, multi-lingual access brings benefits to the user, enabling searching to be carried out in his/her native language and thus allowing access to a wider range of resources, while it allows libraries to give wider access to their collections and promote their use, gaining a larger potential audience across the globe. The different levels of multi-linguality are discussed below, with a presentation of the state-of-the-art and consideration of future prospects. BPDG_opmaak_12072010.indd 137 13/07/10 11:51 Genevieve Clavel-Merrin 138 Data management and display At the most basic level, access to digital resources in multiple languages may be technically difficult through the use of different character sets and scripts. Although the increase in use of Unicode ( in standard software, browsers and hardware such as keyboards facilitates data management and display, and most standard database management systems and access systems will be Unicode compliant, difficulties may occur with scripts or diacritics in searching across multiple collections or when aggregating bibliographic or full-text data from different sources (Clavel, 2006). In addition, keyboards are generally configured for local (national) languages, thus hindering the input of special characters. When planning for access to a multi-lingual digital collection, it is essential to allocate time to test data input, access and display. The time required will depend on the complexity of languages and scripts present: a collection with documents in multiple scripts will require more testing, and staff expertise. Within The European Library service (, for example, a Character Set Group has been set up to check interoperability questions of this type, sharing the test load. However many questions, particularly in the field of alphabetical sorting for display, remain, as the following examples show: z should be sorted before t in Estonian, č and ř in Slovak follow c and r, and are not inter-filed, õ, ä, ö and ü in Estonian are filed at the end of the alphabet, ch in Slovak is treated as a single separate character, and ä, ö, ü are treated as ae, oe, ue in German (whereas ä in French is treated the same as “a” . When results are sorted by relevance the problem may be seen as less acute, but it may cause surprise if an alphabetical sort is chosen. An awareness of these questions is necessary when searching and testing systems in a multi-lingual environment, bearing in mind that there are currently no systems available which manage to treat all these questions successfully. Provision of a software extended (virtual) keyboard will facilitate the input of special characters: open-source versions are available, and examples...


Additional Information

Related ISBN
MARC Record
Launched on MUSE
Open Access
Back To Top

This website uses cookies to ensure you get the best experience on our website. Without cookies your experience may not be seamless.