Text+ User Story

BBAW/SAW-Thesaurus Linguae Aegyptiae,  
BBAW-Digitales Zettelarchiv

Daniel Werning (Berlin-Brandenburgische Akademie der Wissenschaften) 

DFG subject area: 101 Ancient Cultures

Text+ data domain: Collections

Motivation

The Egyptian-Coptic language is the human language with the longest documented lifetime, spanning approx. 4,500 years. Its vocabulary and its texts reflect the knowledge of one of the formative cultures of the ancient world. The “Thesaurus Linguae Aegyptiae” (TLA), edited by the Academies’ projects “Altägyptisches Wörterbuch” (1992–2012) and “Strukturen und Transformationen des Wortschatzes der ägyptischen Sprache” (2013–2034) at the BBAW, Berlin, and SAW, Leipzig, provides the worldwide largest electronic corpus (1.4 million tokens) of Egyptian texts annotated with translation, commentary, and metadata, and consistently lemmatized with a comprehensive lexicon of the Egyptian language through its diachronic phases.  

As a text-oriented Egyptological (DFG 101–05) scholar, I regularly consult(ed) the website of the TLA in different contexts: (i) I am simply looking for meanings of Ancient Egyptian lemmata in German or English; (ii) I am looking for attested hieroglyphic spellings in the Digitales Zettelarchiv (DZA) as part of the TLA website, (iii) I am investigating the meaning of more or less rare words based on their contextual usages in the texts that are lemmatized in the TLA and in the DZA, (iv) I am looking for the translation of texts or text passages in the TLA (as a scholarly translation in its own right), or (v) I research grammatical questions based on the TLA raw data (in JSON or XML) using self-written scripts or third-party XML-databases. A not yet systematically solved problem is the sustainable availability, citability, and, consequently, scholarly verifiability of these research data. From the regular data updates since 2004, only one data snapshot from 2018 is available in an online archive (urn:nbn:de:kobv:b4-opus4-29190).  

Objectives 

Hosting versioned snapshot copies of the annotated text collection and the lexica of the TLA project according to FAIR principles, would make it possible to reference and verify scholarly research based on certain versions of the TLA database, notably also long after the end of this Academies’ project in 2034.   

Solution

I imagine Text+ to offer a copy of the TLA data in form of versioned data files (JSON, TEI-XML) as well as in the form of a set of web services (APIs) that return data from lemmata, text examples, and full texts based on their permanent IDs or based on queries of their metadata and contents. Moreover, I imagine Text+ to host a copy of the scanned paper slip archive (Digitales Zettelarchiv) of the Berlin Egyptian dictionary project, offering a slip scan browser and an API that returns the scan files based on their permanent IDs.  

Challenges 

The TLA project at BBAW, Berlin and SAW, Leipzig generally supports such a data scenario.  

The hosting of the TLA data must support certain ranges of Unicode signs (Egyptian Hieroglyphs, Coptic, Egyptological transliteration, IPA) and keep up with the ongoing development in this area, e.g., the expansion of the set of Egyptian hieroglyphs and the implementation of Egyptian hieroglyphic control characters, U+13430–1343F) . 

Review by community 

The TLA project at BBAW, Berlin and SAW, Leipzig is happy to review a possible implementation.