An Orphaned Lexicographic Database
Torsten Roeder (Leopoldina, Halle/Saale) in collaboration with Gerhard Endreß (Ruhr-Universität Bochum) and Yury Arzhanov (Österreichische Akademie der Wissenschaften)
Text+ data domain: Lexical Resources
The lexicographic database “Glossarium Graeco-Arabicum” records word- and phrase-based translation pairs from ancient Greek texts and 10th century Arabic translations, mainly from the area of medicine and philosophy. The data has been collected over a period of over 40 years, starting on paper filecards, which were later transferred into an online database with user frontend and editor backend functionalities.
Currently, the database contains about 70,000 translation pairs, which are available to study e.g. the transfers of cultural, scientific and philosophic concepts from ancient Greek into Arabic – and to create a comprehensive dictionary in the long term. In addition to a classical search interface and interlinkage to other lexical resources, the user interface offers visual research tools and provides generic data import and export functions. Based on current web statistics over the last years, the database is currently used by about 120 regular users from all over the world, making it a valuabe and indispensable lexical resource. Technically, the database relies on standard technologies like PHP and MySQL, plus a digital image server (Digilib) which is provided by one of the Text+ participants.
However, currently all research projects directly related to the database and to its maintenance are by now terminated. The current hosting institution is not able to provide own resources for database maintenance. As there is no binding legal contract, the persistence of the database is at risk.
The database needs a permanent hosting institution that guarantees the database’s stability and availability for the active scientific community. Regular security updates are required, as well as interface updates for new devices or web standards.
The long-term availability of textual resources is one of Text+’s core tasks within the NFDI framework.
Text+ finds a partner that has both scientific interest and technical abilities to serve as a future and permanent host for the database. That partner includes the database in the own digital research environment and disseminates the data in other formats to support its reuse. Text+ can also serve as consultant for funding applications.
Alternatively, Text+ also provides generic hosting services within the NFDI network. All data is transformed into a standardized format and published, in case the database is not maintainable in longer terms.
Independently from the chosen solution, the data will be incorporated into generic lexicographic and linguistic search engines offered by Text+ partners.
There are some not trivial technical issues concerning possible database security issues, if future software updates will not be applied. Finding an institution that is able to curate the code for the database framework adequately might be a diplomatic challenge, while generic services would only provide basic hosting with critical software updates. It can be argued in this process that curating the database does not only mean taking care for technical issues, but also having a great chance to further develop a solid scientific product – e.g. connect it to other existing datasources within the same research domain.
Review by community
The former collaborators of the project are ready to support the database migration process and to review the services by Text+ during the funding period. The content creators of the database have a high interest of preserving the collection and are committed to apply for additional funding if required.