-
Living Dictionaries:A Platform for Indigenous and Under-Resourced Languages1
Due to globalization, cultural assimilation, the long-term impacts of colonization, and official (or de facto) policies hostile to linguistic diversity, many languages of the world are threatened or endangered. Free online technological tools can assist in documentation efforts and revitalization programs, while also providing safe online spaces in which materials can be systematically recorded and shared. Led by community activists and linguists, Living Dictionaries are collaborative multimedia projects that are editable, expandable, and searchable. Using the latest web technologies to facilitate the creation and storage of language data paired with digital images, audio and video recordings of native speakers, the platform is user-friendly and available in fourteen interface languages. Living Dictionaries promote visibility for under-resourced languages, foster connectivity over vast distances, and support an online community of language learners who wish to access a language without proximity to proficient speakers. The Living Dictionaries platform provides meaningful opportunities for Indigenous citizen science through regular training webinars.
endangered languages, citizen-linguist, technology, Living Tongues Institute for Endangered Languages, lexicography, digital skills
[End Page 57]
INTRODUCTION
Today, at least 3,400 languages are at risk of disappearing before 2100.2 Many of these languages are under-documented in the scientific literature and have few digital, searchable resource materials available for educators and aspiring learners. Communities who are engaged in safeguarding their languages regularly contact our organization, Living Tongues Institute for Endangered Languages, requesting concrete solutions to bring their languages online and make them digitally accessible for use in the home, in schools, and in public spaces. The Living Dictionaries platform3 is an online tool that assists endangered language communities both in conservation efforts and revitalization programs by providing a free, simple, yet effective way to create systematic digital documentation records that include text entries with accompanying audio, video, and images. The platform is funded by research grants and individual donations and programmed by our core team, which is a small group of dedicated linguists and in-house web developers. These multimedia dictionaries can be edited continually and improved over time by the citizen-linguists4 involved in their creation; thus, the dictionaries are "living" because they are ever-evolving and expandable. [End Page 58]
The platform is publicly available in any browser, on any device, and does not require any fees to sign up, contribute, or browse. It is a fast-growing, web-based dictionary-builder that currently serves Indigenous, creole and diaspora languages of the United States, as well as hundreds of underrepresented language communities globally. Our platform is inclusive, diverse, and multilingual, and can be used by everyone from seasoned field linguists to emerging language activists and educators. Since 2019, over two hundred citizen-linguists from more than twenty countries have attended our webinars, during which they have enthusiastically voiced their desire to spearhead and maintain their own digital resources. The recordings of these webinars are all available on our YouTube channel, where they have been viewed by hundreds more. As of September 2023, 939 dictionary editors are working on over 500 Living Dictionaries. Roughly half of those are open for public browsing, and half are in private mode (for community use only, or still under construction). The platform contains over 201,300 dictionary entries. Many Living Dictionaries currently contain a few hundred dictionary entries, whereas others hold thousands of entries. All continue to grow over time.
In the United States alone, hundreds of Indigenous, creole and diaspora languages are in need of digital resources not only for the sake of reference materials, but as a basis to create future curricula. Such technological tools can assist learners and teachers who aim to bring a language back from dormancy (Turin and Pine 2019). As Carpenter et al. write:
digital technologies do not, cannot and will not save languages. Speakers keep languages alive. A digital dictionary on its own won't revitalize an endangered language, but speakers might use it to do work that will. At the same time, technology can be as symbolically powerful as it is practically useful, and often carries considerable political weight.
(2016, 4)
Dormant Indigenous languages such as Tutelo-Saponi Monacan (spoken in North Carolina and Virginia) are making a comeback thanks to [End Page 59] community revitalization efforts, and heritage language learners need access to comprehensive data with recorded pronunciations. The Tutelo-Saponi Monacan Living Dictionary5 now plays a helpful role in the revival of the language among aspiring learners.
Homepage map on the Living Dictionaries platform
People who speak creole languages, such as Louisiana Creole and others, also need access to reliable online reference sources. The Louisiana Creole Living Dictionary, launched by community activists in the state of Louisiana, serves this purpose. Furthermore, diaspora communities such as Garifuna speakers—many of whom originated in Central America and immigrated to California and other states—have benefited from the Garifuna Living Dictionary so they can record and share systematic recordings among community stakeholders. Additionally, elders speaking threatened Jewish languages such as Jewish Neo-Aramaic and Judeo-Kashani need online tools to record their voices and promote their languages across the diaspora.
Our platform raises the question: what if everyone had digital resources that could help keep their language alive? Speaking your language is a basic human right (Skutnabb-Kangas 2000), but half of the world's 7,000 languages are at risk—and systematic tools for creating [End Page 60] revitalization programs are scarce. Interventions have historically been restricted in regional scope, failing to create sustainable solutions for under-resourced communities to build their own resources such as dictionaries. Our vertically integrative approach to language documentation teaches citizen-linguists transferable digital and scientific research skills. By facilitating online workshops training citizen-linguists to record and edit words and phrases in their native languages and create Living Dictionaries, we combine documentation with social empowerment. We also have a partnership with the non-profit organization 7000 Languages to help communities transform their Living Dictionaries data into language-learning curricula using the 7000 Languages platform (to learn more about them, visit https://www.7000.org/).
WHAT'S AT STAKE
Languages reveal the long and complex history of their community of speakers, rich with the metaphors, riddles, songs, jokes, expressions of sympathy, celebration, delight, and tales spanning generations. The Living Dictionaries platform helps underrepresented, minority and Indigenous language communities worldwide successfully claim space in the digital arena. These dictionaries can help facilitate grassroots educational initiatives that prioritize the long-term survival of languages, because state-run programs often serve as de facto vectors of complete assimilation to dominant languages and abandonment of heritage ones (Skutnabb-Kangas 2000, 2023).
Access to digital resources is useful for language communities in the modern age, as information is increasingly consumed and disseminated digitally, specifically through mobile platforms (Anderson and Daigneault 2023). Daily use of minority and underrepresented languages, as well as ease of use in digital arenas, are crucial components in their perceived prestige, transmission, and continuity. Relying on institutional actors—state, academic, juridical—to act in the interests of linguistic minorities and enforce linguistic human rights (not just on paper) has proven to be unsuccessful to date (Anderson and Daigneault 2023). As linguists, we assert that assisting communities to develop new, accessible resources is a tangible contribution in response to the colonialist underpinnings of the field. The value of a tool made free of [End Page 61] cost to communities is vast; it can lead to the creation of literacy materials, increase visibility online, and foster language access for those with connectivity to the web.
The long-term survival of languages comes with big implications: studies in North America and Australia show that language revitalization leads to better mental health, better performance in schools, and expanded economic opportunity in the speakers' communities (Zuckerman 2020, Olko et al. 2022, van Beek 2016) Therefore, pride in ethnolinguistic identity has numerous socio-economic and political benefits. Living Dictionaries, being multilingual searchable tools that provide visibility to under-resourced languages, help promote bilingualism and multilingualism. In addition to their social benefits, bilingualism and multilingualism have positive biological outcomes such as improved cognition and protection against the onset of dementia (Bialystock et al. 2007, Perani and Abutalebi 2015). The Living Dictionaries platform offers the insights and systematicity of linguistic science in a user-friendly context, with no institutional or other administrative roadblocks preventing access, if one has access to the Internet. The platform highlights and preserves essential ecological, social, and linguistic knowledge that lies at the foundation of cultural survival.
Tepehua de Huehuetla Living Dictionary, Gallery view of plant images
[End Page 62] For example, Figure 2, shows the work of a dedicated team of Indigenous Tepehua citizen-linguists in Mexico working in collaboration with researchers in the United States and Canada. The team collected and translated terms related to local plants and uploaded them into the first-ever Tepehua de Huehuetla Living Dictionary, thus creating visibility worldwide for their language, culture, and traditional ecological knowledge.
As Paz (2013, 208) observes that human rights scholars, "see culture, but they forget to consider who will bear the costs of vindicating language demands (and cultural preferences), and who benefits from the privilege and how." Our platform can help activists overcome financial and technical obstacles and undertake the production of their own systematic language resources without incurring great costs. In this regard, Brás (2019) notes: "the call to change the linguistic makeup of the internet is being taken up by everyday users who are slowly chipping away at online language barriers—one string at a time." Our Living Dictionaries platform obviates this problem of cost by providing a free, readily accessible online tool that can help (re)claim safe spaces for linguistic resilience or create them where they haven't existed before.
TECHNICAL FEATURES AND INNOVATIONS
One important feature that makes our website unique among other digital dictionary software is the live updating option, where data entered by a dictionary editor into our cloud-based storage system is immediately accessible to the public and to other editors, with no need to refresh the browser screen or download any content. The site updates instantly and is lightning-fast for editors and people browsing the dictionaries with stable Internet connections. It has some offine searching capabilities, but no offine editing yet (a feature in development). The streamlined, user-friendly online inputting systems (currently available in fourteen languages; see Figure 3) is powerful for activists with limited or emergent digital literacy. Most people with a moderate (or advanced) level of digital literacy require little to no training to get started on adding entries to their dictionaries. Those with a beginner level of digital literacy require one or two hours of training through our webinars, and then can easily work on their own. Our organization also offers regular online workshops for underrepresented communities of [End Page 63] citizen-linguists in the United States and globally. These offer training in STEM skills (such as audio and video recording and editing), as well as in phonetic transcription.
Selection of interface languages on the Living Dictionaries platform
At any time, users may switch to another interface language that may suit them better than English. The platform itself is currently fully translated into fourteen languages (English, Mandarin, Hindi, Hebrew, Spanish, Portuguese, French, Vietnamese, Russian, Kiswahili, Bahasa Indonesia, Malay, Bengali, Assamese), and we have plans to add more languages to ensure maximal global outreach. Having full functionality (with accompanying tutorials and FAQ pages) in languages such as Arabic, Thai, Farsi, Japanese, Tok Pisin, Amharic, Hausa, Wolof, and an array of other major languages would help to serve the widest range of Indigenous, creole and diaspora communities possible. Language activists need to be able to use the platform's functionality in the local languages they are comfortable with, so increasing the multilingualism of the site's interface will lead to the construction of many more Living Dictionaries in the long run.
Working on the platform
Aspiring dictionary creators can sign up to the platform within moments, by using any email address. Once [End Page 64] logged into the Living Dictionaries platform, they can create their own new dictionary project within a few minutes (see Figure 4).
"Create New Dictionary" webpage, https://livingdictionaries.app/create-dictionary
After clicking on the "Create A Dictionary" button, the user is invited to fill out the basic metadata for their project: dictionary name, desired URL, glossing languages, alternate names, GPS coordinates, ISO 639-3 code, and Glottocode (most of these may be edited under Settings later, as needed). Next, they are prompted to answer a short series of mandatory qualitative questions. Their responses allow the team that manages the platform to understand the dictionary creator's background, community involvement, and motivations behind this project. Once completed, the creator can create the empty shell for their dictionary and can immediately begin adding text content and multimedia with no red tape. Their Living Dictionary remains in "private mode" until they contact the site curators requesting that their dictionary be reviewed and switched to "public mode." They can use the "Contact Us" button at any time to make this request or to pose other questions, using a convenient dropdown list of commonly asked questions. [End Page 65]
To add content, entries may be added individually or through bulk upload (see below). There is a blue "+Add Entry" button on the bottom right of the screen, which allows the dictionary creator to add a new entry for a word or a phrase. Each new entry page contains a long list of data fields that could be included with the entry: lexeme, phonetic transcription, local orthography, glosses into various languages, part of speech, semantic domain, notes, noun class, plural form, morphology, interlinearization, sources (an unlimited number of links and citations can be included as possible sources for the entries), dialect(s), and scientific name (if the entry pertains to a species). Figures 5–6 show examples of what a multilingual entry may look like, once filled out. Users can see a traditional alphabetical view of dictionary entries in the main List view, and they can access more in-depth lexical information by clicking on an entry. Figure 5 shows an entry for chuspi lachiwa, Quechua Ayacucho for a type of bee, displayed using the Spanish-language interface. Figure 6 presents an entry for ka:xi 'crow' in the Tutelo-Saponi Monacan Living Dictionary, which contains historical reconstruction notes and is multimedia in the form of a digital image of a crow and an audio recording of the lexeme.
A multilingual entry in the Atlas Vivo de Mayunmarka (Quechua Ayacucho)
[End Page 66]
A bilingual entry in the Tutelo-Saponi Monacan Living Dictionary
Figure 7 presents an entry for the word kǝnsodǝndʒi 'dogs' in the Living Dictionary for Sora, a Munda language spoken in India. This entry contains renderings in multiple scripts as well as glossing languages, accompanying tags for semantic domains, plus morphology and interlinearization.
Content in individual entries may vary widely. An entry can minimally include a headword and a simple translation into one or more glossing languages, or it can be expanded in scope to include detailed cultural information, multimedia and external links, sample sentences and more. Some dictionaries that contain a lot of detailed information can even serve as cultural encyclopedias; for example, the Birhor Living Dictionary (built by our organization) for an underrepresented community in India) contains a wide variety of ethnobotanical and medicinal knowledge in Birhor, Hindi, and English, with accompanying images.
Speakers can record and store audio and video directly to dictionary entries within seconds (see Figure 9). Through recording the voices of those who are still fluent in their languages, the Living Dictionaries platform provides a modern way of accessing and elevating previously underrepresented speech communities, thereby amplifying voices who may not otherwise be heard, and connecting descendants with their [End Page 67]
A multilingual and multi-scriptal entry in the Sora Living Dictionary
An entry for babat buna, a type of medicinal plant, in the Birhor Living Dictionary
[End Page 68] own linguistic heritage. This can be valuable for forming new literary traditions, digitization initiatives, and multimedia narrative projects, allowing these languages to grow in all types of public spaces in the long term. Dictionaries on the website can also serve as companion reference sources for existing language curricula whose creators wish to have a searchable dictionary component online.
An entry in the Opata Living Dictionary containing an audio recording
Displaying the data
Once a Living Dictionary has been populated with entries, anyone with the link to it may look up a word, phrase, or translation by using the search bar at the top of the page. Visitors may also sort and display specific content by activating filters for parts of speech, semantic domains, custom tags, speaker information, and other metadata. People browsing the dictionary can view the contents displayed in an alphabetized List view, a Table view (which displays the contents as a spreadsheet), or a Gallery view of images. Dictionary editors have private access to a Print view, which displays and generates a PDF of the dictionary contents. Under the "Settings" tab, dictionary editors may activate a checkbox that gives the public printing and downloading capabilities as well. Educators then use Print view to generate filtered data sets of specific entries and print them for lesson planning. Here is the Print view for the Tepehua de Huehuetla Living Dictionary, which is available for the public to print out. [End Page 69]
Print view with images in the Tepehua de Huehuetla Living Dictionary
Assisting Diverse Audiences
The Living Dictionaries platform was engineered to help citizen-linguists record and store user-generated content. We have designed the platform with them in mind, optimizing it for global remote collaboration, ease of use and accessibility on all mobile devices and browsers. Dictionary editors may upload entries individually on their own time, from anywhere in the world, or they may submit batches of data to us to upload through spreadsheets. Batch uploads are reviewed carefully by our core team to check for systematicity and accuracy. Living Dictionaries that are created on the platform are private by default. Our team carefully reviews each dictionary before it is made available to the public; this can require a few hours to a few weeks, depending on whether the project's data needs to be reviewed by linguistic specialists from outside our organization. We research the language or dialect represented in the dictionary by consulting Glottolog, Ethnologue, Wikipedia and ELCat6 and make sure there are no errors in the dictionary's settings and metadata. We check the text entries, the multimedia content, and the dictionary creator's [End Page 70] responses to determine whether a dictionary is ready to be switched from private to public.
Rights, permissions, and licensing
All linguistic data uploaded to the site remains the property of the Living Dictionary authors and the language communities they work with. Our Terms page clearly states:
We do not assert any ownership over your Contributions. You retain all rights and retain full ownership of all your Contributions and any intellectual property rights, or other proprietary rights associated with your Contributions.7
In this manner, we make sure that dictionary managers retain ownership of their data. We reserve the right to control the programming of the platform itself. Our code base is available on our institute's GitHub page,8 and our license operates under a source-available, non-commercial license also listed in GitHub.
Technical specifications
Our Living Dictionaries platform is designed to be freely available online to anyone on a mobile device or a desktop computer. The language data, audio recordings and images are stored in the cloud, and may be exported by dictionary creators at any time for use in other projects. The code is currently stored in our GitHub repository, and we run code with Vercel serverless functions. Our platform administrators have access to the back end from anywhere in the world. We accept, store, manage and share user-generated data in a convenient and e³cient manner, and employ Google Firebase for many of our backend features including password-less authentication, media storage, and database. Our content is rendered as pages upon request, so as dictionary editors make changes, website viewers see updated content immediately. Our content management system (CMS) is tailored to the needs of the platform, which was built from scratch using SvelteKit, a framework for building web applications. Svelte, a component-based framework that allows one to build highly e³cient and reactive user interfaces, is also one of the most innovative open-source, front-end tools that is available. Using Svelte results in faster [End Page 71] changes and better overall online performance, even in low-bandwidth contexts, which is crucial because a high percentage of our dictionary managers hail from developing countries with slow network connections; they therefore need the fastest experience possible while they are using our platform.
Our team uses many tools to focus on scalability and modularity. We try to have a tight feedback loop with users and make changes easy to deploy using GitHub actions for continuous integration and deployment. We constantly update the main packages of the software as well as the third-party libraries for every new step taken on the software side. The development team is versed in the latest developments on both the hardware and software sides. It is important to us that we detect any failures or problems early and address them in a meaningful way across all the platform's interface languages. We want to ensure that the features of the Living Dictionaries platform meet the needs of the multilingual communities we are serving. Of course, maintaining and developing the site incurs costs: web development, data management personnel, and technical hosting costs. Our organization is therefore always in fundraising mode to sustain this platform. We will continue to undertake public fundraising initiatives, request support from private donors, and seek grant awards to achieve these needs, to ensure that we can keep this valuable resource available free of charge.
DEVELOPMENT ROADMAP
At Living Tongues Institute for Endangered Languages, our strategic development roadmap includes improving speed optimization in low-bandwidth contexts, offine mode functionality for browsing and editing Living Dictionaries offine, audio recording format toggling capabilities (.mp3 and .WAV modes), visualizing editing history, geo-tagging entries, integrating Praat open-source tools for audio analysis on the platform, and improving import/export features. We currently import dictionary data from spreadsheets, .CSV files and FLEx files (Standard Format), and we have made strides in importing all current and legacy formats from other dictionary software (such as Toolbox, Shoebox, Lexique Pro, TshwaneLex and others). We also offer multimedia file export capabilities for images, audio and video files, as well as text data export to .CSV and .PDF directly from the platform. [End Page 72] We plan to expand that functionality by offering .JSON, .XML (Lift format), and other formats that may serve communities who need to use their data offine. For communities that have requested it, we have also begun creating companion dictionaries for existing pedagogical courses on the 7000 Languages online platform.
CONCLUSION
Underrepresented languages need online resources to thrive in the digital era because citizen-linguists need to be able to easily store, reference, and share content in their languages. A Living Dictionary is an online tool built with the latest web technologies to help increase the availability of language resources for many under-served languages in the world. It is an interactive online tool that digitally preserves words and phrases, and it allows the user to hear high-quality audio and video recordings of their language, as well as edit, record and upload new content anytime.
annaluisa@livingtongues.org
gdsa@livingtongues.org
Anna Luisa Daigneault is a linguistic anthropologist, writer, and musician. She is Program Director for Living Tongues Institute for Endangered Languages, a US-based nonprofit. She holds a Master of Science in Ethnolinguistics from Université de Montréal in Canada, and her interests focus on the documentation and revitalization of the endangered languages of the Americas as well as the creation of technological tools that can assist language activists. She helps manage the Living Dictionaries platform, an online dictionary-builder for under-represented languages. Daigneault collaborates with speakers of Indigenous, diaspora, and creole languages around the world and has contributed to the publication of dozens of Living Dictionaries. She has conducted ethnolinguistic fieldwork and taught digital skills workshops for Indigenous community members in Peru, Paraguay, Guatemala, Chile, Papua New Guinea, Canada, and the US. Her articles about language loss and reclamation have been published by The Dominion, Global Voices, Birds Connect Seattle, and SAPIENS.
Gregory D. S. Anderson is the founder and president of Living Tongues Institute for Endangered Languages. He has degrees in Linguistics from Harvard (AB 1989) and the University of Chicago (PhD 2000) and has published widely in the fields of historical linguistics, descriptive grammar, morphology, verb typology, language contact, and the linguistics of Munda, Salishan, and Ogonoid languages. Anderson has conducted extensive fieldwork among Altai-Sayan groups as well as Gta', Ho, Gutob, Remo, and Sora tribal communities in India. In 2010–2013, he was a National Geographic Society Fellow. He has worked in Nigeria on Eleme, on Tibeto-Burman languages such as Koro Aka and Hruso in Arunachal Pradesh, in Bolivia on Kallawaya and Chipaya, on nearly a dozen languages belonging to eight different families in Papua New Guinea, and in Oregon on Siletz Dee-Ni. Anderson has mentored many Indigenous collaborators who have gone on to become linguists as well.
REFERENCES
Footnotes
1. This article appears in Indigenous Lexicography, a special issue of Dictionaries: The Journal of the Dictionary Society of North America (44:2, 2023) edited by Christine Schreyer and Mark Turin. It is open access under a Creative Commons CC-BYNC-ND license (https://creativecommons.org/).
In the print version, all illustrations are rendered in grayscale. Any color illustrations can be found in the open-access online version at Project Muse: http://muse.jhu.edu/resolve/213
2. As of June 6, 2023, 3,464 languages are currently under threat according to http://www.endangeredlanguages.com. Other counts exist that suggest this number could be even higher.
3. The Living Dictionaries platform is publicly available at: https://livingdictionaries.app
4. In our view, a citizen-linguist is a person who is actively engaged in their speech community, believes in safeguarding their native (or heritage) language and works towards transmitting it to future generations, even if their language may be understudied by scholars and undervalued by their own community and the wider public. They are people who fulfill the multifaceted roles of documentarian, language activist, and digital content creator, whether they have formal training in linguistics or not.
5. This Living Dictionary is publicly available: https://livingdictionaries.app/tutelo-saponi/entries/list
6. The Endangered Languages Catalogue is available at: https://endangeredlanguages.com/
7. The Living Dictionaries Terms and Conditions page: https://livingdictionaries.app/terms
8. The Living Tongues GitHub page is at: https://github.com/livingtongues