Text+ User Story

Platform for annotated Corpus Data for theoretical hypothesis-driven research

Jutta Hartmann (Universität Bielefeld)

DFG subject area: 104 Linguistics

Text+ data domain: Collections


Hypothesis-driven Research in theoretical research frequently requires a more fine-grained, qualitative annotation of data retrieved from large corpora. Annotation frequently is done manually and should ideally be done by more than one annotator. As such annotation usually involves a substantial financial/time-consuming effort, sharing and collaborative work on such data in the research community would certainly be an advantage for all researchers involved. 

For the community to be able to easily share and collaboratively work on such datasets, the first step is a standardized platform that allows for citation, sharing, querying and further annotation of the same data set.  

While there are already examples of sharing such data, see for example the the ZAS database on clause-embedding predicates, the “Datenbank zu zu/dass-Komplementen” or the specific interdisciplinary platform TInCAP at the University of Tübingen,  a central platform that collects such data should improve on the visibility of such data across universities and research institutions.  


For a collaborative effort such a platform can in an initial phase collect individual databases, and then continue to provide a platform that allows for 1. sharing annotated data, 2. querying annotated data, and 3. further annotation of existing data sets.  

In the long-run it is conceivable that the platform provides a representative data set for different languages that are regularly used for annotation, so that a corpus of high-quality data can be created in a collaborative effort.  

Text+ can provide and host such a platform that satisfies such needs.