Text+ User Story

Generic services for digital editions offer great chances for small projects

Torsten Roeder (Leopoldina, Halle/Saale)

DFG subject area: 103 Fine Arts, Music, Theatre and Media Studies

Text+ data domain: Collections

Motivation

In the field of music and theater studies, I investigate as an independent individual researcher the reception of musical works and their performances. Newspapers and magazines, which have been widely distributed since the 19th century, are an important source of information, and the diversity of their reception histories can be explored both in their scope and in their development processes. The critical exploitation of the source texts is an important step in this work. As a final product, I would like to publish a text edition as well as a data collection on historical performances based on information from these texts. This will make it possible to shed light on the historical reception of a topic or even a selected work and to show the spectrum of criticism, as far as this can be represented by the texts. I would also like to make the edited texts as well as the data collection available for subsequent use by follow-up research. 

Objectives 

I see an acute need for an improved provision of digitized data up to full text indexing by means of Optical Character Recognition (OCR) and Named Entity Recognition (NER). Important catalogs and holdings for the field of music culture are often only accessible via paid services (e.g. RIPM). In addition, local newspaper coverage is often only available in part because the holdings are not complete; to my knowledge, city archives often have limited resources for the provision of digital copies. Furthermore, retrieval is sometimes hampered by language and national borders, for example, in the case of newspapers from publication areas of German-language periodicals in regions that are no longer German-speaking today. 

Another problem is the long-term availability of the edition and the corresponding data. While producing TEI-compliant text encoding and setting up a server on a local device are still relatively unproblematic tasks, the demands on a server visible on the Internet are much greater and can hardly be solved or responbisly managed by individuals. Since it has not yet been possible to find an institution that could guarantee long-term availability, or since the institutional interest of the institutions requested was insufficient, the data is now on GitHub, and the digital edition and the database itself have since been inaccessible to others. 

Solution 

Text+ could, in my opinion, initially work towards community-based standards for OCR with consortium partners and cooperating consortia or, if appropriate, offer generic services for OCR and/or NER with the help of the consortium partners themselves. Furthermore, I would expect that at least one consortium partner could offer a suitable platform for long-term hosting of the edition and data collection, possibly in a context appropriate to the topic. 

I consider it a special concern to return the data generated by researchers or in research projects to the providers. Here, the institutions involved in Text+ could, for example, develop “backwards interfaces”, which allow exactly this with full consideration of scientific transparency. 

In my opinion, there is a general interest in basic standards for text sources, which is visible among the consortium partners of Text+ as well as in the wider community. This requires an exchange about concrete research needs and a cross-provider implementation strategy. Furthermore, generic hosting for research data in the field of text editing is urgently needed for smaller and short-term projects to keep editions and metadata available. 

Challenges 

I wonder to what extent generic hosting services can respond to specific data structures and presentation requirements of individual edition projects. For example, in addition to the edition text, can Text+ also provide analytical tools that may need to be customized for the individual edition? Or is a certain loss of functionality to be accepted? 

Another question that might only be clarified within the EOSC would be that of cooperation with foreign providers in order to access digitized material or text resources more easily. What I would like to see from the NFDI as a whole, and specifically from Text+, is a strengthening of international connections for research work with texts and text digitization. 

Review by Community 

I am very willing to make my previous projects available as a usecase or to participate in the evaluation of the Text+ services with the help of one or more examples.