Organizing collaboration among multiple projects in computational literary studies
Patrick Helling (Universität Köln), Kerstin Jung (Universität Stuttgart), Steffen Pielström (Universität Würzburg)
Text+ data domain: Collections
Computational literary studies, i.e. research on literary texts (DFG-Fachsystematik: 105 Literaturwissenschaft) supported by methods from computational linguistics and computer science, are an emerging field within the digital humanities. Since 2020, the DFG is funding a priority program in Computational Literary Studies that includes 10 research projects in at universities in Germany and Switzerland. Researchers in these projects, though pursuing individual research agendas, naturally share various interests, objectives and obstacles. Our task as coordinators of the program is to foster the exchange of knowledge, tools and data between the individual projects, and reveal opportunities for collaboration.
In this context, the availability of central infrastructure elements for all participating projects is vital. Researchers need to share a diversity of digital items, including the texts themselves, but also software and code, annotations and annotation guidelines, literature and bibliographies, training materials and paper manuscripts, one rather recent addition to the list is statistical models that can be several GB large in times of deep learning methodology. Another necessity is communication, including tools like mailing lists, wiki systems, calendars, and poll systems.
Furthermore, the program would benefit greatly from a centralized research data infrastructure that allow to share text-related research data, including annotations, guidelines, code and statistical models. Vital requirements in this context are equal availability to all research and education institutions as well as individual researchers, version control, the possibility to share data non-publicly, and compliance with the institutions’ policies on data protection, which will be best achieved among other things by hosting on servers located within the European Union.
The program’s basic communication requirements are provided by current the national CLARIAH-DE and DFN infrastructures. The national CLARIAH infrastructure is providing mailing lists that are used for both internal communication on program and working group level as well as for external communication to disseminate activities to interested researchers outside the program. For internal organization purposes, project management, documentation, document exchange, and living documents, the program is using a wiki system hosted by the existing DARIAH-DE infrastructure, that is providing and maintaining the system and taking care of user management. Furthermore, existing DFN structures like polling service and video conference rooms are used for internal communication.
For sharing and exchanging research data however, the projects still have to resort either to institutional solutions, like git repositories run on an institutional server, or to commonly available solutions provided by mostl US based tech companies like Google or Dropbox. Both types of solutions are often suboptimal or even for some individual users inapplicable in the context of a such an academic research program: institutional infrastructure is often not designed for cross-institutional collaborations and the use of corporate solutions can be restricted by individual universities’ policies.
As a solution, the program would need nationally hosted data repositories available at equal levels of comfort to all collaborating researchers. Such services include GIT repositories for exchanging code and data, a solution for collaborative work on large-scale text collections and corresponding annotations, and a cloud service for sharing statistical models and other large data files. Also for non-GIT based solutions a possibility for collaborative work and version control are vital requirements.
Beyond these basic communication and data exchange needs, program researchers have also requested a solution for low-level communication (inspired by the commercial service “Slack”).