Text+ User Story

Integration and access to heterogeneous resources of the Koblenz love letter archive

Canan Hastik (TU Darmstadt)

DFG subject areas: 104 Linguistics111 Social Sciences

Text+ data domain: Collections

Motivation

For my research on various aspects of private epistolarism I need a corpus of private family, couple and love letters. Since there is no official collection order and archive location for these everyday cultural testimonies of simple language partners, the corpus must be successively built up from scratch by means of collection calls, i.e. the material must be digitised, formally indexed, transcribed and annotated. In this context I have to find and build a place and an infrastructure for the physical and digital archive in order to make the research material available according to current standards and to secure it in the long term. 

Koblenz University Library has offered itself as a cooperation partner for digitisation and as a repository location for the dynamically growing physical archive, which currently holds about 20,000 letters. In my own contribution I design a database in which I record the formal indexing data. Due to the different rights and license situation of the individual personal and partly copyrighted material of, among others, anonymous senders, I have designed an anonymisation concept, collected the personal data separately and stored the image digitisation in a secure cloud system. I use the TextGridLab to produce, store and annotate the transcriptions in accordance with material-specific guidelines, including anonymisation. 

To sustainably secure the ever-growing heterogeneous resources of the emerging Koblenz Love Letter Archive, a cross-institutional research infrastructure is to be established along the tried and tested and established documentation workflow. To this end, the original objects will continue to be archived at the Koblenz University Library, while the digital surrogates will be transferred in their entirety to a research repository at the Darmstadt University Library, where they will be stored for a long period of time and the metadata will be made accessible, for example, via external interfaces such as correspSearch. 

For this, I have to technically integrate the existing individual solutions along the workflow into the research repository of the ULB and anonymise and pseudonymise the existing digital material on different levels. 

It would be very helpful if I could receive both technical and legal advice and support for the integration up to the development of the research repository. 

Objectives 

Digitisation and indexing of the letters is largely a self-contribution with constantly changing staff* and students mainly from non-DH degree programmes, so that the documentation of the workflow and guidelines is of decisive importance for continuity and quality of the recording. 

The digitised material and metadata not only have to be stored in a secure research repository, but are also constantly being further processed and enriched. The requirements that have emerged from the active and proven research data management with the love letters include, in addition to the persistent and complete anonymisation and pseudonymisation of the material, enrichment with geographic norm data, versioning, and user-friendly web-based access regulated by a role management system, which includes the use of low-threshold research tools for transcription and annotation. In order to be able to process larger amounts of text, interfaces for tools for handwriting recognition, e.g. for transcribus or for tools for converting speech into text should also be considered. For the automatic indexing of the content of the corpus, networking with dictionaries, as a document corpus for language use, or the connection of tools for named-entity recognition would be desirable. Visualisation tools should be available for the exploration of the collection. 

Solution 

The current status of the tools and infrastructures used (scryptos and TextGridLab) and developed (LBA Catalogue) is to be surveyed, and the possibilities for the transfer to an infrastructure solution of the ULB are to be evaluated with regard to the established and tested documentation workflow. 

Challenges 

Particular challenges of the project are, for example, the integration and support of heterogeneous user groups (established researchers with highly specific and cross-disciplinary questions — e.g. sociology of the use of nicknames -, young researchers and students, civil scientists), who need adapted and individual support in the use of the corpus and tools; furthermore, the special dynamics of the archive, which depends on individual letter donations.