Text+ User Story

Graph Models for the Genesis of Goethe’s Faust

Thorsten Vitt, Sina Bock (Julius-Maximilians-Universität of Würzburg)

DFG subject area : 105 Literary Studies

Text+ data domain: Editions


Johann Wolfgang Goethe has been working on his drama Faust for almost his entire life. As witnesses of his work, 556 manuscripts are currently known. Together with 15 relevant prints that appeared during Goethe’s lifetime, a newly constituted text and supplementary material, they have been edited and published in a new hybrid edition (Goethe 2019).

However, the order in which the individual manuscripts have been created as well as the dating of the acts of inscription are subject to more than 100 years of research and editorial activity. Those studies produced thousands of individual chronological assertions, however, most of them only deal with a handful of witnesses and provide either a relative chronology or a broad dating for them.

The only attempts to aggregate individual statements in order to place all relevant objects in a chronological-stemmatic relation to date have been provided by Renate Fischer-Lamberg. Her stemmata for two acts of Faust II (Fischer-Lamberg 1955, 150–66) probably mark the practical limit of how much of this information can be gasped by human means alone.


For the edition, this information needs to be aggregated in order to

  • integrate as many witnesses as possible into a chronology
  • associate each witness with at least an approximate timespan of creation within the more than 60 years Goethe worked on the Faust material
  • suggest a chronological order of the witnesses or the variants, respectively, for use in genetic visualisations and apparatus
  • justify those (automatic) decisions by linking them to the original assertions, so users can see why witnesses were ordered or dated in a specific way.

For a machine-supported solution, we formalized all assertions and combined them into a single directed multi-graph. If none of the assertions were contradictory, an ordering that is consistent with all assertions as well as limits for the absolute dating of individual witnesses could be inferred. The graph is contradictory, though, and since individual contradictions may involve hundreds of assertions, an algorithmic solution was chosen to suggest a minimal set of edges to remove to make the graph conflict-free (Vitt et al., 2019).

Various visualizations and tables present the algorithmically eliminated assertions and the inferred information to the users. While the method always produces a consistent result, its decisions can be influenced by weighing and modifying assertions in the source data.


Since digital editions are tremendously valuable research data there an infrastructure is needed which supports the development of sustainable digital editions and maintains those data to keep them easily accessible for further investigations.

Modelling always comes with a loss of information. It is, however, not always clear at modelling and extraction time whether this loss is significant. Thus, it would be helpful to have the original sources digitally available and linked from the model in order to be able to revise and clarify assertions that appear problematic after integration into the model.

Feedback and annotation loops are quite complicated. It is possible to revise the model by issuing git pull requests against the XML version of the modelled assertions, however, this requires quite some initiation. An easier annotation and feedback management system might be helpful.


While the model and the surrounding tooling are good in aggregating the information load, drawing conclusions automatically and visualizing the conclusions and their basis, further challenges occurred.

The sources use not only different sigil systems for identifying witnesses, but they also often refer only to parts of the witness. For example, since paper was expensive, there may be manuscripts that have been re-used at a later stage although they already contain totally unrelated verses from an earlier working phase. Also, a manuscript may have been originally copied by one of Goethe’s writers, later revised by Goethe, and then contain additional marks that have been applied even later and that denote which of the revisions should be kept and which not. Thus, in reality, we don’t date witnesses, but instead inscriptions that may have appeared at multiple times onto the same archival unit.
While the assertions that are modelled for the macrogenesis inference often deal with such inscriptions, those are not always clearly demarked. In fact, it may be a research problem in itself to identify which physical marks on a paper form one contiguous inscription.


Fischer-Lamberg, R. (1955). Untersuchungen zur Chronologie von Faust II 2 und 3 Berlin Diss. phil. (masch.).

Goethe, J. W. (2019). Faust. Historisch-Kritische Edition. (Ed.) Bohnenkamp, A., Henke, S. & Jannidis, F. Version 1.2 RC. Frankfurt am Main ; Weimar ; Würzburg http://​faustedition​.net/ (accessed 24 January 2019).

Vitt, T. and Brüning, G. (2019). Determining And Visualizing Genesis: A Digital Edition of Goethe’s Faust. DH2019 Book of Abstracts. Utrecht https://​dev​.clariah​.nl/​f​i​l​e​s​/​d​h​2​0​1​9​/​b​o​a​/​0​9​2​4​.​h​tml (accessed 15 July 2019).