Goetzʼs Corpus Glossariorum Latinorum Online (CGLO)
The Corpus Glossariorum Latinorum Online (CGLO) provides digital access to the Corpus Glossariorum Latinorum (CGL, 1888–1923) and relevant archival material at the Thesaurus linguae Latinae (TLL). Hosted by the Bavarian Academy of Sciences and Humanities, it has received funding from the Text+ Initiative (DFG) and the Laboratoire dʼHistoire des Théories Linguistiques (HTL, UMR 7597) at the Centre national de la recherche scientifique (CNRS).
The project has developed in two stages. First, the scanning of archival material at the TLL Archive was made possible by Dr. Franck Cinato with a grant from the HTL/CNRS in 2021. Secondly, a grant from the Text+ Initiative from August to December, 2023, awarded to Dr. Adam Gitner made possible the XML encoding of volumes VI and VII, containing the Thesaurus glossarum, with the aim of integrating it into the German National Research Data Infrastructure (NFDI).
The main output of the Text+ phase of development includes the following components:
an XML version of the complete Thesaurus glossarum with TEI-compliant tags of the most important content, integrated into the Text+ Database and available on GitLab in open access;
the same text in markdown format for easier reading;
a web interface on the Bavarian Academy website for searching and navigating the content, with images of the marginalia in W. Heraeus’ personal copy of the CGL.
To produce these outputs, OCR-generated text of the two volumes was manually corrected by a team of researchers, who concentrated mainly on the accuracy of the lemmata, the Greek interpretamenta, and the CGL cross-references (some uncorrected OCR errors remain). The corrected markdown files were further cleaned up and automatically tagged using Python scripts, which also generated the JSON data for the CGLO search interface. The XML scheme has been adapted from the Thesaurus Glossariorum project so as to maintain interoperability and to support their work on a thorough and necessary revision of the entire CGL. Technical details about the implementation, scripts, and complete data can be found on the GitLab site.
Text+ Team Members

(principal investigator)

(postdoctoral researcher)

(postdoctoral researcher)

(research assistant)

(student assistant)
Research Partner

Digital Humanities: Bavarian Academy of Sciences and Humanities
Dr. Eckhart Arnold · Dr. Piroska Lendvai (OCR) · Dr. Stefan Müller (the website)
This project would not have been possible without the support of many people at several institutions. From the Text+ Initiative, support and feedback was provided by the Lexical Resources Task Area, chaired by Alexander Geyken and Axel Herold. At the TLL, institutional support was provided by Michael Hillen, Josine Schrickx, and especially Manfred Flieger. The research and personnel departments of the BAdW provided essential infrastructure. Thanks are also due to Anne Grondeux at the Laboratory Histoire des théories linguistiques (CNRS, UMR 7597). The scanning was done in Munich by Pixelprint GmbH.