Text+ User Story

Bibliotheca Arabica, project at the Saxon Academy of Sciences, Leipzig

Stefanie Brinkmann (Saxon Academy of Sciences, Leipzig)

DFG subject area : 106 Social and Cultural Anthropology, Non-European Cultures, Jewish Studies and Religious Studies

Text+ data domain: Collections

Research on Arabic literatures between 1150–1850 primarily based on manuscript metadata (catalogs), bio-bibliographic reference works and documentary manuscript notes (directly from the Arabic manuscripts) (PI: Prof. Dr. Verena Klemm)

Motivation

For my literary historical research, digital collections, primarily manuscript databases, partly personal and work title-related repositories, are central research tools. Our project is building a database and we use as many of the digital manuscript collections available to us as possible throughout the world.

Our project uses the graph database technology.

Objectives and Challenges

(both in the development of their own database, as well as regarding their research experience with other databases)

1. The different languages and writing systems

The search and input options must include European languages, but also Arabic, Persian and Ottoman-Turkish, i.e. the Arabic alphabet written from right to left with its variants (additional letters, Persian, Ottoman). The different orientations of the scripts (left to right, right to left) sometimes cause problems in the representation.

In addition to the Arabic alphabet (+ variants), there is also the problem of the large number of transliterations, which are used to represent the Arabic or Persian or Ottoman-Turkish language or script with Latin letters. Internationally, many transcription systems are in use (e.g. DMG, IJMES, EI etc.). Some databases are in fact built to (potentially) include all or most of these transcriptions, so that one can quickly find the specific word (author or work name etc.). It is also always an advantage if search (and input) is possible without a paraphrase.

Example www​.islamic​-manuscripts​.net: Hafez or Hafiz (without transcription), or Ḥāfiẓ, Ḥāfeẓ, Ḥāfiż

Not all devices are set to transcription, so that some databases offer their own digital keyboards, over which one can write transcription (but also Arabic).

Special case: This is a prerequisite especially when it comes to Arabic papyri, since the Arabic script is often written here without diacritics that distinguish individual consonants. A consonant train (rasm) can mean up to five different consonants, depending on their position in the word. These can only be distinguished by the addition of dots below or above the consonant framework. If you have a word in a text without diacritics, you would still like to be able to search the digital database (especially to be able to identify the word more closely). In addition to the “normal” Arabic writing system (with diacritics), a variant comes into play that only takes the consonant structure into account. (see the Arabic Papyrology Database, which usually also works with dotted letters)

2. Dates/Calendar

A future infrastructure must take different calendar systems (in our case the Islamic Hijra year, but also Persian calendar systems etc) into account. Since many authors have different information about the year of death, it must be possible to consider several dates, between X and Y, etc.

3. Name and title variants

Arabic names are not always uniformly indicated in reference works and sources. In addition, many works have several titles or title variants (some authors have not assigned a final title). If a database has only one variant of a name or work title and no link to alternatives, it is sometimes difficult to find information (hits).

The linking of this data with authority files (VIAF, GND …) is also central here.

4. Interfaces, possibilities of linking/networking, export-import of data

5. Long-term perspective, sustainability

Noteworthy and useful projects go offline at some point after the end of the project period and are no longer available to the researcher.

Solution/Comment

Altogether, databases could be improved through a coordination office or an IT network for the oriental science subjects. Existing non-oriental science tools and/or the creation of own databases should take into account the special requirements with regard to writing systems and transcription(s). As far as name variants, data, etc. are concerned, the development of common technical standards would be desirable (this also includes the problem of standard data). The development and definition of interfaces for data import and export would also appear to be central; this would also make it possible to continue to maintain or otherwise integrate (with reference to the source) the data from such projects so that they are not lost once they are offline and no longer maintained.

Overall, it would be desirable to have a consulting and/or coordination office whose goal it is both to have an overview of completed and ongoing database projects and to provide advice on the application/development of new databases in such a way that isolated and incompatible solutions are avoided from the outset.