Text+ User Story Corpus of Novels of the Spanish Silver Age (CoNSSA) José Calvo Tello (SUB Göttingen) DFG subject area: 105 Literary Studies Text+ data domain: Collections Motivation Within the CLiGS project at the University of Würzburg (2015-2020) one of the corpora gathered was a middle-size corpus of Spanish novels. It contains novels by Spanish authors published between 1880 and 1939. One of its characteristics is that a section of the corpus (around a third), is still under copyright, because the authors lived up to the year 2000. The corpus is encoded in XML-TEI. It contains dozens of fields of metadata (in the teiHeader) about the plot, the author or the publication. Besides, it has been linguistically and textually annotated through several tools (narrative, grammatical, semantic information). In the original proposal, it was pointed out that the data would be made available, without further specification of what to do if the texts are still under copyright. Objectives As a researcher, I need support about the legal framework of publishing extracted data from texts still in copyright. I also need to use repositories for the archiving of these texts to allow other researchers to access my data. This can be offered in several ways: Original and complete data after a series of identification steps or registration for materials that are still protected. Extracted features in large spans of texts (frequencies per volume or chapter, sentence). Extracted features in shorter spans of texts (paragraphs, sentences, verses). Extracted features based on the collocation or n-grams. Further models to download, such as topic modeling or word-embeddings. Linguistic annotation from the entire text. Metadata. Markup without text (to analyze the structure of the text, such as number of paragraphs, number of verses, etc.). Some of these features should be published openly, without any kind of registration (metadata annotated by me, frequencies of markup). Text+ should allow archiving but not making available data that is still in copyright. It should be defined at what year which text is free to be published, and this should be done automatically or semi-automatically (the author died in …
Read more