Text+ User Story

DraCor Corpora

Frank Fischer (HSE Moskau), Peer Trilcke (Universität Potsdam), Mathias Göbel, José Calvo Tello, Raisa Barthauer (SUB Göttingen)

DFG subject area : 105 Literary Studies

Text+ data domain: Collections

Motivation

The Drama Corpora Project DraCor was published by the Centre for Digital Humanities at HSE Moscow and the University of Potsdam. Previous projects of this research group were DLINA-project and Ezlinavis (currently still in alpha phase online). It is edited by Frank Fischer (Higher School of Economics, Moscow), Peer Trilcke (University of Potsdam) and Boris Orekhov (Higher School of Economics, Moscow). DraCor contains 11 corpora in 9 languages with 1,000 plays in total, more texts are in preparation. The texts are entirely complete drama texts, which means that fragments are not included. The texts are all published between 1730–1930 and taken from several sources, mainly from the TextGridRepository. The Drama Corpora are encoded in basic TEI and share structural, semantic and other metadata and structural markup.
One of the main functions of DraCor is the extraction of specific data from the texts to generate automatically networks that display figurative copresences. It shows which characters appear together in particular acts and how large the percentage of each character in the spoken text is. The network data can be downloaded in CSV or Gephi-compatible GEXF-format.

Objectives

As a project we see the following needs for Corpora:
The most important aspect is certainly that the Corpora should be archived in the long term, especially those that have not been transferred from TextGrid. It should be possibly to use tools also on other corpora e.g. texts in other languages, and to apply other tools to the projects text collection. The aim so far is to provide a starting point for various digital projects. In combination with other corpora and tools this function could be extended. All in all this would integrate the project into a larger context so that we can increase the scientific reach and promotion of collaboration, interdisciplinarity and interaction.

Solution

We see different options for solutions. Regarding the first point (necessary means/innovation) it would be helpful to have some instructions on how to import existing corpora into TextGrid. A workshop, a blog or a tutorial would be a suitable solution.  Solutions or examples for the creation of the TextGrid objects should be provided. In addition, support for semi-automatic comparison of existing editions and works in TextGrid to map the imported ones would be a good option.
Concerning the point “use of services/tools/components” it would be good if the texts can be sent to switchboard and the switchboard should be able to admit TEI and extract selected elements. The option to easily integrate other texts to the tool would be helpful as well as the option to easily use other tools on the texts.