Text+ User Story

Endangered Linguistic Diversity

Frank Seifart (Leibniz ZAS)

DFG subject area: 104 Linguistics

Text+ data domain: Collections

Motivation

General and Comparative Linguistics, Typology, Non-European Languages (subject area “104–01) have unsurprisingly a high demand for natural language recordings from as many different languages as possible to undertake comparative studies of morphological, syntactic, or prosodic systems, see e.g. http://​doreco​.info/. Usually the language data comes from language documentation projects which, in addition to collecting data for linguistic research, also aim to document linguistically manifested cultural heritage. In order to be useful for linguistic research, these data should be transcribed, translated and morphologically analyzed with interlinear morpheme glossaries. Furthermore, annotations and audio files should be accessible for research without major hurdles (e.g. personalized user requests). The corresponding data are currently archived by the creators in various repositories such as The Language Archive (TLA), or Language Archive Cologne (LAC). However, the underlying infrastructure is fragile, and its maintenance is resource intensive. Moreover, it is currently managed by agencies that have only limited long-term funding, so tasks that serve the purpose of permanent archiving are often carried out by short-term third-party funded projects. This is particularly alarming, since a large part of this data was collected in extensive field research on languages that are often threatened by extinction, i.e. this data cannot usually be collected in the same form again in the future. 

Objectives

  1. Already collected data should be made accessible for linguistic research. This means that the elements of linguistic documentation collections, i.e., linguistic corpora that can be used in research, will be identified as such and meet some minimum standards, e.g., consistent transcription and annotation, disclosure of applied standards (e.g., lists of abbreviations, orthographic conventions, etc.). 
  2. Accordingly, guidelines for future data collection will be developed. 
  3. Language archives need support. The development of (central components) of the archiving infrastructure should be better coordinated to create synergies.