Provision of processable textual data in libraries
Thorsten Wübbena (Leibniz-Institut für Europäische Geschichte)
DFG subject areas: 101 Ancient Cultures, 102 History, 103 Fine Arts, Music, Theatre and Media Studies, 104 Linguistics, 105 Literary Studies, 106 Social and Cultural Anthropology, Non-European Cultures, Jewish Studies and Religious Studies, 107 Theology, 108 Philosophy
Text+ data domain: Collections
The academic libraries already have a great number of digitised items. In some cases, OCR procedures have already been used to offer users more extensive options. Unfortunately, the text offered here is not always of a quality that allows digital processing without having to put further work into pre-processing. A situation that burdens the already scarce resources in research projects.
It would be desirable if Text+ and its participants could address the problem recorded here, so that higher, more easily processable data quality plays a more important role in future in the services offered by libraries to support digital scientists.
Now that the provision of the full text to users is more or less firmly anchored in the portfolios of the libraries, the next step would be to raise awareness of the need to provide high data quality (beyond metadata). Despite many good examples of approaches and implementations, further activities are needed here to change the mindset and the current practice. Text+ and its relevant participants would be the ideal ambassadors to advance the standards of data provision for digital texts.