We submitted a revised funding proposal for Text+ in the second round of the NFDI. We are continuing our plan to develop a research data infrastructure focused on text and language data. In order to integrate user requirements, we have invited researchers to send us user stories written from your field of research. The goal is to cover a wide range of the participating disciplines, data domains and research questions.
Within a short time, more than 120 user stories were submitted, 115 of which we were able to publish here with the consent of the authors. Many thanks to the active community for your valuable contributions and your support! Your feedback is important for jointly shaping the Text+ offering. In general, we consider these contributions as an important participation in Text+.
Step by step we published the user stories on this website as consistent and referenceable posts. All contributions are clustered regarding to Text+ data domains: Collections, Lexical Resources, Editions, plus a fourth comprehensive category. As a further classification feature, we use the DFG subject area structure (on the level 101–113, the subdisciplines become evident from the texts themselves).
The full report was published in September 2021 and is available at the following link: https://doi.org/10.5281/zenodo.5384085
The data for the analysis is published in the DARIAH-DE Repository: http://dx.doi.org/10.20375/0000-000E-67ED-4
User stories were submitted on the basis of this template. For subsequent calls this might get modified to better accommodate the experiences made in the first call.
Find the user stories sorted by DFG subject area here.
Find the user stories sorted by Text+ data domains here.
The user stories have an overall focus on linguistics and literary studies (104–105) and only a few relate exclusively to history (102) or fine arts, music, theatre, media studies (103). Surprisingly classical philology (101) and social and cultural anthropology / non-European cultures, jewish studies and religious studies (106) as well as theology (107) and philosophy (108) submitted many user stories. Social science subjects also participated to a certain extent in our call. In addition, some user stories have a decidedly interdisciplinary focus. Many user stories refer specifically to infrastructural questions or possible services of Text+. On the basis of individual research questions, they show which requirements, but also suggested solutions, they contribute.
At first sight we realize that the expectations regarding the mission and solution provided by Text+ and the NFDI differ substantially depending on the context, whether it is an individual project, a larger research or a working group. A common issue throughout many user stories is the accessibility and usability of limited research data. Another important topic aims at the possibility of reusing valuable, but perhaps less prominent data from small languages or individual research projects with support from Text+. Also, the linking of distributed resources is a concern that is ongoing and has not yet been solved for the community.
A team of researchers from the Text+ consortium read each user story. In total, we analysed 118 user stories submitted by the community and labelled each story with keywords. The total number of unique keywords was 67. The majority of the user stories have multiple labels, with a mean of six labels per story, with a minimal value of one and a maximum number of 13. The total number of assignments of keywords to all the user stories was 773.
The keywords are an attempt to summarize several aspects that are recurring topics in the stories or particularly important aspects regarding the requirements for the NFDI. Some examples of keywords: whether the user produces or requires data (“data producer”, “interest in further data”), the medium of the data, whether the data is multilingual, FAIR principles, whether interaction between several resources is required, which kind of resources are concerned (“corpus-corpus”, “lexical resource-corpora linking”).
The visualizations are based on the entirety of all 118 analysed user stories. They show the percentage of the top twenty keywords assigned as a horizontal stacked bar chart. The first visualization gives the total number of keywords assigned to all stories. By clicking the first image, you will see the results and a short explanation for the stories of each data domain (Collections, Lexical Resources and Editions) and for the Task Area Infrastructure/Operations.