Text+ User Story

A standardized metadata format for linking the Leibniz Edition (and beyond)

Harald Siebert (Berlin-Brandenburg Academy of Sciences and Humanities)

DFG subject areas: 102 History, 108 Philosophy

Text+ data domain: Editions

Motivation

The largely unpublished writings on natural science, medicine and technology from the G. W. Leibniz collection (Nachlass) will be published in their entirety, including the texts already published, in series VIII (planned to comprise twelve volumes, two of which are already published) of the Leibniz Edition (Academy Edition). The writings in series VIII consist of linguistic and mathematical texts as well as drawings and tabular presentations; they belong to different types of texts: studies, essays, drafts, records, reviews, excerpts, notes, minutes, sketches, transcripts. The task of the critical edition is to clarify the inventory, sequence and genesis of the writings (including dating) and to make them accessible and usable for research. Since Leibniz wrote much more than he published, only the edition of the collected writings can provide information about the topics he engaged with, as well as the intensity, the results and the periods of his activities. The insights gained from this will allow us equally to better understand the historical development of science in the 17th and 18th centuries. 

Objectives 

The Leibniz Edition publishes the writings and letters from the collection in a total of eight series. This separation is due to practical reasons and organization, and follows current boundaries of scientific disciplines. Nevertheless, a long-term goal overarching the different volumes is to make visible and to investigate the origins and interrelations of Leibniz’s universal scientific approach contained in the collection with its 100,000 units of letters and writings. The task of the Leibniz-Edition would be the encoding and the crosslinking of the texts by means of metadata. The task of Text+ would be to provide a standardized metadata format for this purpose and to be open to further data formats, especially the PDF format (as the lowest common denominator of the Leibniz Edition; see Challenges below). 

The development of such a metadata format, which would include bibliographic information as well as information on the content of the edition (e.g. when and about what was written and communicated to whom?), would not only benefit the Leibniz Edition. A standardized format for linking editions stored and accessed in a distributed manner could also form the basis of an index of digital editions. 

Solution 

We desire a standardized metadata format for editions and digitized items of different data formats, that also includes analog editions. Provided that the texts are encoded accordingly, this would make it possible to establish relationships in terms of content, space and time between texts from different sources and also within the Leibniz Edition. Who researched or published when and where, and on what topic? Obtaining specific answers to these questions in individual cases would provide insights into contexts and dynamics of science and technology at Leibniz’s time.  

We desire full-text searchability of the contents included in Text+ in order to search the strings using regular expressions. This would make it possible to discover unexpected similarities between texts whose authors are otherwise not put into the same context. Such similarities in quantitative data (numbers, series of measurements) would make it possible, for example, to question empirical evidence and experimental claims to origin and authorship. In this way, influences could be discovered and insights contextualized that originate from persons who haven’t been noticed by research yet; (in case of Leibniz, such a surprising interrelation could be shown thanks to a hint from his archive, which in other cases would only be possible through extensive comparisons with other texts of foreign origins). By pointing out these unimagined connections and entanglements, important evidence could be gathered for a better understanding of scientific development as the result of collective activity and achievement, for further focusing on common events in the history of science, and for freeing oneself even more from the person-oriented historiography, all too often concentrated on the “Great White Men” view. 

Challenges 

The Leibniz Edition is planned to consist of 127 volumes (without sub-volumes). As of now, 60 volumes (without sub-volumes) have been edited and published in print, 35 of them are available in PDF format, partly retro-digitized, partly generated from the different text editors (Tustep, Plain TeX, LaTeX); the LaTeX data of volumes VIII,1 and VIII,2 have been converted into TEI-XML. The encoding of the entire collection in XML is not expected in the foreseeable future. If the content is to be used digitally, the only remaining data format for the digital end product is PDF, commonly used by all previous Leibniz Edition series, as well as other (already completed or still ongoing) edition projects. Therefore, a solution has to be found and tools to be developed, in order to access this large amount of data stored in PDF and to use the contents centrally from one platform.  

Basically, we would like to see Text+ not only address projects that use standards such as XML/TEI, but also take into account the community of editors who work with less common formats such as TUSTEP or standards not compatible to DH such as PDF. 

Review by Community 

Yes, I’d love to.