- Introduction to "Open Digital Corpora of Greek and Latin"
Among the subdisciplines of classics, text-based studies might not highlight the transformative effect of computing quite as vividly as does, for example, classical archaeology. While the latter's adoption of computer-aided design software and drones often appears front and centre in academic publication, digital texts tend to lurk in the background of philological papers. Nevertheless, from the founding of the Thesaurus Linguae Graecae in the 1980s until the present, philologists have derived obvious benefits from digitalization: unlimited keyword search and—with the advent of the Internet—ubiquitous availability. Meanwhile, an ever increasing number of scholars endeavour to match the needs of text-based research to the potential of the rapidly growing power of computation. We hope this volume will provide a milestone on this developing path, as it not only illustrates how newly expanded corpora for classical scholarship are being generated but also demonstrates best practices and new tools for their philological analysis.
These four papers began as presentations at the Open Philology Workshop held by the Humboldt Chair at the University of Leipzig in July 2014. They reflect the guiding principles of that institution and its leader, Professor Gregory Crane. Most importantly, all of these projects operate upon, and in turn provide, open data. In other words, they begin with data that have no copyright restrictions and are freely available for republishing and other reuse, and their results are similarly licensed so that copyright and other restrictions are waived, allowing them to be widely and freely used in turn.
This approach allowed the conference participants to consider the Latin or Greek digital collection far beyond a given website, CD-ROM, or online service for pay. They grappled with the challenges of digital corpora in the classics: How do we generate, convincingly search, and coordinate large digital collections of Greek and Latin texts and authors? Robertson and Boschetti describe how they transform public-domain page images containing ancient Greek into new corpora. Jovanović describes a digital method for discerning the important place of Lucretius in the Croatian [End Page 339] Latin of the second millennium ce. Diddams and Gawley's paper employs the allusion-discovery tool "Tesserae" to launch into a closely read exploration of the intertext between Cicero's Orator and Augustine's De Doctrina Christiana IV. Through the sources of the Pentekontaetia, Martin and Berti highlight the challenges that fragmentary works present to digital libraries and information resources systems.
The openness of these projects extends beyond their inputs and results: wherever there are computational processes, these are similarly freely available to be downloaded, modified, and even repurposed. This approach produces a kind of verifiability, since anyone with the inclination should be able to reproduce these results; but it also fosters a spirit of cooperation, since the results of one project naturally can become the starting point for another project. This spirit imbued the workshop with a very congenial and inventive mood. We hope that the imaginations of readers of this edition will be engaged similarly.
All of these projects have, in the intervening years, progressed considerably beyond the state presented here. While further publications will, of course, delineate these advances in detail, some indication of their trajectory might be of interest. Diddams and Gawley have applied the formula presented here to ever larger data sets and, in a forthcoming publication, explore the possibilities it offers to future efforts to measure literary influence. In response to the issues raised in her paper with Martin, Berti has been developing a data model for annotating textual fragments in a digital environment. As part of this project, the complete digital version of the Fragmenta Historicorum Graecorum by Karl Müller is now online at http://www.dfhg-project.org. Jovanović has recently done large-scale comparisons of neo-Latin authors from the same period (the Renaissance) but from different countries (modern Italy and modern Croatia), developing a set of more detailed and rigorous tests of textual similarities. Finally, Robertson and Boschetti have published a new OCR process based on the one described here and developed an online editing environment for its output. These new efforts are some of the...