Text+ User Story

Building collections of social media data for religious studies

Frederik Elwert (Ruhr University Bochum)

DFG subject area : 106 Social and Cultural Anthropology, Non-European Cultures, Jewish Studies and Religious Studies

Text+ data domain: Collections

Motivation

In Religious Studies (Fachkollegium 106), religious practices, community building, and communication on the Internet have become a popular field of study. Web data representing religious interaction and knowledge production online complement the historical, philological, and anthropological sources that traditionally build the basis for research in the field.

Religious studies often focus on minority communities and traditions that are less represented in canonical collections and common metadata schemas. Their understanding requires careful contextualization of the source data. Generally, despite funder’s increasing attention to the subject, data sharing and re-use is still hardly practised in Religious Studies.

In a study on religious online forums, we collected data on four major Christian and Muslim forums in English and German (which are also internally diverse, especially since Muslim forums frequently use Arabic and Turkish in addition to the main language). The project used computational text and network analysis methods to investigate the dominant topics and interaction structures of the forums. These data are also valuable resources on their own, as they capture both user-produced text and multi-modal content, as well as social interaction, over a substantial period of time. However, due to the partly personal data they include, they cannot be published freely.

Objectives

Social Media data can be understood as collections in the typology of Text+. While some web collections used in linguistic research, the forum data do not only represent language, but specific communities of practice. In order to allow for secondary analyses of such data, secure archiving and access would be needed, in addition to pseudonymisation.

Additionally, preparing messy data from the web is a tedious task. A lot of pre-processing steps like encoding normalization or language detection are required before computational analyses like topic modelling can be carried out. 

Solution

Text+ could offer secure archiving and handle access requests. At the same time, it could encourage the use of emerging standards for the exchange of computer mediated communication, like the CMC profile of the TEI.

Projects can also benefit from text processing services for data preparation and computational analysis.