Text+ User Story

Dialect Dictionaries

Laura Sturm (Sächsische Akademie der Wissenschaften)

DFG subject area: 104 Linguistics

Text+ data domain: Lexical Resources

Motivation

At the Saxon Academy of Sciences in Leipzig the Mecklenburg Dictionary, the Thuringian Dictionary and the Berlin-Brandenburg Dictionary were developed. These dialect dictionaries have not yet been digitized. The data is available in print and would have to be digitized to give researchers easy and quick access and to make the data available to a wider, non-scientific public. This is because there is a great deal of interest among the population, especially in dialects. 

Objectives

Not only since the corona pandemic has scientific work been shifted from the analogue to the digital world: Digital resources have long been used extensively and intensively in the scientific community and thus also in dialectology. This is because the digital provision of research data means that the data is quickly available at any time and independent of the research location. Especially in the case of the Lower German dictionaries, which include the Mecklenburg and Berlin-Brandenburg dictionaries of the Saxon Academy of Sciences and Humanities, very little is available in digital form to date. Because Thuringian is a conglomerate of different dialects, its digitization is particularly important for dialect research and also for dialect continuum research. Therefore, I often face the problem of being able to complete essays or articles for other dictionaries only after visiting a corresponding library. Digitizing these dictionaries would therefore save me time and effort, so that my own research could be more effective.  

When using dictionaries that have already been digitized, I am always noticing the poor readability (among other things due to special characters not being displayed and poor text flow). I find the loss of the clear structure of the articles particularly disturbing, so that it takes much longer to find the relevant data. This is where digitization can and must do more! 

On the Trier Dictionary Network you can already find numerous dialect dictionaries that are networked with each other. Thus, I can immediately recognize in which dialects the words I am interested in are also attested. Therefore, I would like to see the three dialect dictionaries of the Academy digitized and networked with the Trier Dictionary Network. 

Often a word has numerous references, to display them automatically on a regional map would contribute decisively to clarity and would also interest laymen to a greater extent. 

It is also important for scholarly use that both the volume and the page/column number of the respective article are given, so that easy citation is possible. It would also be particularly important to include the year or the date when the article was written, including a history of changes, because this is the only way I can take the state of research of the article into account in my own research.  

The elementary component of digital research I often miss is the possibility to submit corrections and suggestions for changes to the respective articles. I could imagine a contact mask for this. The proposal would then be forwarded to the relevant experts and could then be revised digitally. 

What I also find lacking in many digitization projects is that, overall, more is being thought of in terms of databases than in terms of digitization. For example, with appropriate tagging, one could display all the words with the meaning “head” and thus obtain an overview of the various expressions for this. On this basis, language maps would again be very appropriate and especially popular in the non-scientific community. Similarly, word fields that have been tagged in this way would not have to be created by the respective researcher. 

Solutions

Text+ could digitize the dictionary data in the form of a database/article editing system. This would make it possible to answer various research questions (see above) if tagged accordingly. In addition, articles could be made clearer by assigning the individual components to different text boxes in the database.  

Challenges

In my opinion, the biggest challenge lies in the correct tagging of the data so that in the end interesting research questions can be answered from the database. 

Review by community

We would like to implement or use solutions to the problems described within the framework of Text+ and thus integrate our data resources and make them available as openly as possible. In doing so, we would be prepared for a broad exchange of experiences.