Humanist Computer Interaction under Scrutiny
Marcel Frey-Endres, Torsten Schenk (Technical University of Darmstadt), Tim Geelhaar, Anna-Lena Körfer (Johannes Gutenberg University Mainz)
Text+ data domain: Collections
The project Humanist Computer Interaction under Scrutiny (short: Humanist) is a joint project of the Johannes-Gutenberg University Mainz (JGU), the University of Applied Sciences Mainz (HS) and the Technical University Darmstadt (TU), which is funded by the BMBF in the funding program VIP+ (duration: 01.10.2017–30.09.2020, https://humanist.hs-mainz.de/). The project aims to investigate the innovative potential of digital methods and tools within research in the humanities. The main focus lies on aspects of innovation and analysis, including digital workflows within project-dependent annotation and analysis procedures on an exemplary humanities research object as well as the joint testing and evaluation of digital methods and tools in practical workshops.
The project architecture links sub-areas from different disciplines: (I) Ancient History (JGU) covers 101 Ancient Cultures (101–03 Ancient History) and 104 Linguistics (104–03 Historical Linguistics, 104–02 Classical Philology, Medieval Studies). The research interest of the JGU-Team lies on the historical and philological commentary of the Variae of Cassiodorus, a collection of letters from the first half of the 6th century A.D., which comprises letters from the Ostrogothic royal court. (II) The Department of German Studies, Computational Philology and Medieval Studies (TU ) covers 409 Computer science (409–02 Software Engineering and Programming Languages, 409–06 Information systems, Process and Knowledge Management). The team of the TU provides the technical infrastructure as well as the digital working environment for annotation and analysis of the letters and supervises data import and export. (III) The Chair of Information Systems and Media Management at the University for Applied SciencesMainz (HS) covers 112 Economic Sciences (Business Information Systems) and focuses on the acquisition, evaluation and validation of the interaction between researchers and the digital methods and tools they use by means of user-analytical methods. For our user story, we, the Humanist Team, have developed an alternative format by allowing users from the content (A) and information technology (B) project core to interact with each other. A central aspect at Humanist has always been and still is the close interdisciplinary cooperation between humanities and computer science. This close collaboration has had a formative influence on the user experience and, in our view, is a central challenge and prerequisite for the success of digital research projects in the humanities: It is highly relevant with regard to the support of future research projects by Text+.
(A) My task is to set up a digital workbench that covers the research process from data acquisition and structured annotation to text analysis and provides conceptual connectivity for the realization of several milestones in our project: a) the rapid creation of a critical mass of letters already provided with metadata, annotations and comments b) the low-threshold communication of digital methods in workshops and c) the implementation of user analyses based on the workshops and project-internal workflows. In order to design the digital workbench, (B) had to determine the specific requirements for indexing the letters and implement a framework for workflows that could be validated by user evaluation. The originally planned application of TextGrid in combination with other analysis instruments proved to be only partially successful. Encoding the already extensively tagged data directly in TEI/XML for indexing the content and commenting turned out to be extremely error-prone and confusing. In addition, the implementation of multiple annotations of overlapping text passages, the tagging of variants of text passage interpretations and overlapping internal and external cross-references reach certain limits due to the hierarchical structure of TEI. This dilemma has been solved by developing the annotation tool QAnnotate, which is tailored to the specific needs of the letter collection. The focus is on data entry with a minimum of annotation effort. Compared to XML-based solutions, pure text and individual markup data (independently stored stand-off annotations) are managed strictly separately. A shared git repository enables collaboration on research data. The annotated inventory can be evaluated via Python scripts with Jupyter Notebook according to corpus-linguistic features or converted into structured (csv) datasets for network and geoanalytical investigations in a Nodegoat instance.
(B) My task in the project is the historical and philological investigation of the Variae of Cassiodorus for the workshops as well as for a book publication (ebook and print) and thus belongs to the edition part within the project. The project has lemmatized the Latin letters in TEI with the help of the eHumanities-Desktop of the Goethe-University Frankfurt. Through QAnnotate, I provide Latin text with a translation and comments on persons, places, historical entities as well as linguistic and literary characteristics with the help of common research tools. The user interface and structure of the database were developed in close interaction with (A) and become more and more differentiated as the work on the data progresses. In order to simplify data entry and to identify ambiguities or conflicts, we have developed a QAnnotate specific syntax that is permanently differentiated. These challenges also require a high degree of collaboration when transforming the data into print format.
Challenges und Solutions
(A/B) Research infrastructures — There is a growing need for centrally accessible and well-connected “meta”-research infrastructures that provide access to available digital tools and other resources. Permanently funded and up-to-date collections, overviews or search services could make orientation in the DH tool landscape much easier, especially for projects in their initial phase.
Data modelling — Essential prerequisite for shaping collaboration between computer science and the humanities would be solutions for the facilitation of the exchange of indexing concepts between the discipline and resource development, e.g. through applications that enable researchers in the humanities to design models on a graphical level that are already machine-readable and can thus be used directly for further programming. This would also make incremental and iterative programming procedures more accessible and comprehensible for researchers in the humanities.
Data Transformation — Interfaces for the transformation of data into other formats or for other purposes also remain desirable. A solution could be provided by low code platforms that allow the execution of simple processing steps for the transformation or extraction of research data and offer possibilities for the creation of individualized markup vocabularies and syntaxes. This would facilitate the process of communicating the basics of information technology to the users of research data and thus the productive knowledge transfer between the methodology of the humanities and computer science.
Sustainability and re-usability - The Text+ Initiative enables us to transfer research data, data models and working tools into a sustainable, transparent and open research environment. In this way, we increase the visibility and reusability of data and offer others the opportunity for further development. This concerns QAnnotate as a tool for future letter editions and the use of our philological-historical annotations for working with other texts. We hope that Text+ will not only facilitate the permanent access of research data, but also ensures the interoperability of research data and feedback on their use. Especially after the end of project periods, communication platforms are needed to discuss questions of adaptation and re-use.