Text+ User Story

Testing new access possibilities to licensed content using block chain technology

Matthias Kaun and Gerrit Gragert (CrossAsia and the FID Asia, Staatsbibliothek zu Berlin)

DFG subject areas: 101 Ancient Cultures 102 History, 104 Linguistics, 105 Literary Studies106 Social and Cultural Anthropology, Non-European Cultures, Jewish Studies and Religious Studies

Text+ data domain: Collections (comprehensive)

Motivation

a. Being a research infrastructure facility for Asia-related studies, Staatsbibliothek zu Berlin (SBB-PK) and the FID Asia with its platform CrossAsia currently manage amongst others a variety of full texts, image-text and image-image objects. Much of the content is licensed and thus only accessible to authorized users. The licensing agreements usually contain the rights for hosting and text and data mining. Nevertheless, SBB-PK as licensee must ensure that the licensing regulations are observed especially regarding the dissemination of data. As a nationwide service users are situated throughout Germany. Therefore, we need secure and trustworthy solutions that enable usage of the data in national and international networks. Furthermore, solutions need to be developed, which support international research projects that would like to use the materials licensed at SBB-PK.  

b. Researchers using the CrossAsia infrastructure today increasingly require not only traditional read-only access to the licensed content, but also want to use (partly self-developed) tools and the text data to conduct their own analyses in their own or other external working environments. As a licensee, SBB-PK would like to provide support in this respect, but also needs to guarantee secure procedures to the licensors. 

c. The SBB-PK and the FID Asia with CrossAsia assume that further, especially supra-regional or national and international infrastructure services are facing similar problems and that various approaches should be tested. 

Objectives

a. The aim of the project is to try out new licensing strategies for internationally oriented cooperative research, that also allow working with external DH tools on the licensed texts. In order to ensure the data privacy of users and the security of the licensed content in an internationally oriented research environment at the same time, the possibilities for using blockchain technology — also experimentally — in the context of digital sciences. 

b. Text+ data domain: voice and text-based collections       

The overarching aim is to offer equally together in accordance with the FAIR principles both, the image and text data, for which CrossAsia could negotiate hosting, indexing and text mining rights, as well as the public domain texts and image data such as photographs, that are permanently stored together with their indexing data in the so-called CrossAsia ITR (Integrated Text Repository), a Fedora  data storage facility,. The CrossAsia ITR contains as of August 2020 full texts of about 335,000 titles with 53 million pages from 26 different licensed databases, most of which are in Chinese and English, and public domain texts from the Asia Collection of the SBB-PK Digitized Collections in Western and Asian languages. 

Solution 

a. In the area of licensing, new, generally European licensing schemes are already being tested with vendors, but also with libraries and research infrastructures, in order to support an increasingly internationally oriented research community with the necessary resources and materials. The introduction of the CrossAsia licensing model demanded a secure authentication and authorization structure. Here, however, the requirements for high reliability and maximum user-friendliness confront each other. Reliability is necessary to create and maintain trust, especially among the licensors from Asia; user-friendliness increases the acceptance of the service. Presently, Shibboleth is already used for nationwide authentication. If one considers all the requirements, which the further development of licensing and the necessity of comprehensive access to data in the context of digital sciences demand, it seems reasonable to deal with the possibilities (and also limits) of Distributed Ledger Technology (DLT), the so-called block chain technology, in the context of a pilot project, taking into account the experiences already made with authentication and authorization with regard to the provision of licensed content. 

Challenges

a. There is always the possibility that an international cooperative licensing approach may fail due to various problems such as budgets, own standards etc.  

Since the DLT is currently developing rapidly, there is of course a possibility that the evaluation of the capabilities will show that a usage is not yet possible or that the proof of concept fails. 

c. As an alternative, other solution approaches will be pursued, such as the provision of licensed content in derived formats, such as n‑Grams (https://​crossasia​.org/​s​e​r​v​i​c​e​/​c​r​o​s​s​a​s​i​a​-​l​a​b​/​c​r​o​s​s​a​s​i​a​-​n​-​g​r​a​m​-​s​e​r​v​i​ce/). 

Review by Community 

a. If the described approach should prove to be viable, then it will be tested in research projects accompanying the project and afterwards rolled out further. The aim is to test the approach prototypically using CrossAsia as sample structure with large amounts of data. Later the approach needs to be tested in and adapted to other structures and infrastructures. CrossAsia has received several enquiries voicing an international demand. Therefore, in a first step, there is the possibility of a European, but also international cooperation with partners in the USA and Asia. 


References

“ITR and developments” in the CrossAsia Blog: https://​blog​.crossasia​.org/​k​a​t​e​g​o​r​i​e​/​i​t​r​-​u​n​d​-​e​n​t​w​i​c​k​l​u​n​g​en/ 

Martina Siebert, Matthias Kaun, Oliver Schöner: CrossAsia-ITR (Integrated Text Repository) — Aims, Structure, Technique. In: ABI Technik, Volume 39, Issue 4, Pages 303–310.