This is a preprint
In lieu of an abstract, here is a brief excerpt of the content:

1 The Journal of Chinese Linguistics (Preprint Article)© 2016- by The Journal of Chinese Linguistics. All rights reserved. ISSN 0091-3723/Linking basic lexicon to shared ontology for endangered languages: A linked data approach toward Formosan languages LINKING BASIC LEXICON TO SHARED ONTOLOGY FOR ENDANGERED LANGUAGES: A LINKED DATA APPROACH TOWARD FORMOSAN LANGUAGES Chu-Ren Huang Shu-Kai Hsieh Laurent Prévot Pei-Yi Hsiao Henry Y. Chang The Hong Kong Polytechnic University National Taiwan University Aix-Marseille Université & CNRS, France National Tsing Hua University, Taiwan Academia Sinica, Taiwan ABSTRACT This paper proposes an innovative approach to link basic lexicon (e.g. Swadesh list) to upper ontology as the foundation of OntoLex interface to address the challenge of building language resources for endangered languages in the linked data paradigm. A linked data approach to language resources requires existing, and preferably sizable, language resources. For endangered and other less-resourced languages, however, the scarcity of existing resources limits the possibilities and potential benefits of linking. The challenges are then, how can construction of language resources for endangered language continue to thrive in the linked data paradigm, and how can the linked data approach benefit language resources for endangered languages. Our proposal requires the bare minimum of available data and we show with examples from Formosan languages (Austronesian or aboriginal languages of Taiwan (Blust 2013, 20))i Authors claim no conflict of interests to publish this paper in Journal of Chinese Linguistics. Chu-Ren Huang (author for correspondence) [churen.huang@polyu.edu.hk]; https://orcid.org/0000-0002-8526-5520 Shu-Kai Hsieh [shukaihsieh@ntu.edu.tw]; https://orcid.org/0000-0001-9674-1249 Laurent Prévot [laurent.prevot@lpl-aix.fr]; https://orcid.org/0000-0002-2463-2382 Pei-Yi Hsiao [hpy0804@gmail.com]; https://orcid.org/0000-0003-2870-7158 Henry Y. Chang [henryylc@gate.sinica.edu.tw]; https://orcid.org/0000-0002-3734-6772 i. The term “Formosan languages” conventionally refers to the Austronesian languages, not to the Sinitic languages, spoken in Taiwan (“…it is customary to use ‘Formosan’ to refer to the aboriginal languages of Taiwan. I follow this practice, and use ‘Formosa’ as a geographical designation for the pre-modern period, …" Blust 2013: 20). 2 JOURNAL OF CHINESE LINGUISTICS (PREPRINT) The Journal of Chinese Linguistics (Preprint Article)© 2016- by The Journal of Chinese Linguistics. All rights reserved. ISSN 0091-3723/ Linking basic lexicon to shared ontology for endangered languages: A linked data approach toward Formosan languages that 1) this approach is applicable to endangered languages, and that 2) in spite of the restrictions imposed by scarcity of resources, the linked linguistic data consisting of basic lexicon + upper ontology generate important new information. Comparing Swadesh lists from different languages allowed us to build a small shared ontology that reflects direct human experience, and can serve as the cross-lingual conceptual core. In addition, these micro-ontologized lexicons can be used as seeds for developing a fully-grown and more comprehensive documentation of linguistically motivated ontology for each language. KEYWORDS Endangered languages Linked Data Swadesh list Ontology SUMO Formosan languages (Austronesian languages in Taiwan) 1. INTRODUCTION Language resources have witnessed a substantial growth and emergent diversity in recent years. Modeling the heterogeneity and multitude of language resources in an interoperable way has gained much attention in the research community of Language Resources (e.g. Stede and Huang 2012, McGrae et al. 2015). Although earlier work, such as those spearheaded by the Open Language Archives Community (OLAC)ii , Bird and Simon 2003) focused on the sharability of metadata and accessibility of the resources through a common repository, recent trends in the Linked Data Paradigm (Chiarcos et al. 2012) in the context of Semantic Web (Berners-Lee 2006, Buitelaar and Cimiano 2004) poses both new opportunities and new challenges. The linked data approach requires that the content of the resources being accessible and interpretable under shared ontology in addition to the accessibility of the data. This has become a promising approach for cultural heritage and language documentation as it allows rich representation and preservation of cultural knowledge (Hyvönen 2012); it also poses serious challenges for less-resourced, and especially endangered languages, as they face the most pressing difficulty ii Open Language...

pdf

Back To Top

This website uses cookies to ensure you get the best experience on our website. Without cookies your experience may not be seamless.