Patterns and Fingerprints in London, British Library, MS Harley 4431
Bespoke software developed during a transcription project on London, British Library, MS Harley 4431 makes it possible to analyze this very large corpus. Investigations so far have concentrated on variant spellings and the textual ornamentation; these are discussed in detail here. Further, guidance is given for other scholars wishing to use the software to examine the manuscript.
London, British Library, MS Harley 4431 contains a collection of Middle French works by Christine de Pizan (1365–c.1430), commissioned by Ysabel of Bavaria, Queen of France, and presented to her early in 1414. For literary critics and for codicologists, the Queen's manuscript presents a remarkable opportunity to chart Christine's progress as a writer and a publisher during fifteen years beginning in 1399 with the preparation of the Epistre Othea (Paris, Bnf, MS fr. 848), continuing circa 1402 with the two extant copies of the Livre de Cristine (Chantilly, Musée Condé, MS 492–93 and Paris, BnF, MS fr. 12779), both subsequently enlarged by the author-publisher, and then the Duke's manuscript circa 1408 (Paris, BnF, MSS fr. 835, 606, 836, 605, 607), originally bound as a single volume.1 It has been argued that all of the Queen's manuscript were copied by scribe X, as were many of the other manuscripts prepared under Christine's direction, and that X is to be identified with the author herself.2 However, that view has not been universally accepted.3
The thirty works in the Queen's manuscript were prepared in ten fascicules, all of them designed, copied and corrected under the author's supervision. The order of the fascicules today is the order planned by [End Page 121] Christine herself; however, it has been suggested that the original order was actually different:4 To illustrate the collection and ensure that it would be fit for the Queen's library, Christine engaged artists of the highest quality. The 132 miniatures, many of them painted by the Master of the Cité des dames,5 are complemented by rich secondary decoration—illuminated initials, paragraph marks, rubrics, and running titles.
|f. 2||Table of Contents (no. 0)|
|ff. 3–94||Prologue; lyric poetry (11 items); 4 narrative poems (nos. 1–16)|
|ff. 95–142||Epistre Othea, in verse and prose (no. 17)|
|ff. 143–77||Livre du duc des vrais amans, in verse (no. 18)|
|ff. 178–220||Livre du chemin de long estude, in verse (no. 19)|
|ff. 221–36||Dit de la pastoure, in verse (no. 20)|
|ff. 237–54||Epistres sur le Roman de la Rose, in prose (no. 21)|
|ff. 255–89||Devotional and didactic works, in prose and verse; Livre de Prudence, in prose (nos. 22–28)|
|ff. 290–375||Livre de la Cité des dames, in prose (no. 29)|
|ff. 376–398||Cent balades d'amant et de dame, in verse; Lay de dame, in verse (no. 30)|
This article examines how the project to transcribe the manuscript and to produce a digital version of it was managed and completed between 2004 and 2009, and how work continues to make the digital version freely available to the public via a regularly updated website. This article highlights the potential importance of certain variant spellings; the project team has identified these as a result of analysis of the transcription and digitization processes, using the software developed during the project. Finally, this article provides technical background for scholars who wish to use the more complex knowledge that has been embodied in the electronic transcription with a view to carrying out further research and thus contributing to our understanding of Christine's œuvre.
The literary and artistic importance of the Queen's manuscript has long been recognized: it is the latest in the series of collected manuscripts produced under the author's supervision, and presents Christine's last word for very many of the works it contains. More recently, scholars have drawn attention to its importance for the history of the French language: having been proofread by the author herself, this substantial corpus is an authentic witness to the French of Paris as it was written in the years immediately preceding 1414. [End Page 122]
Over the ten years since the research project was launched, our understanding of the complex operations involved in the Making of the Queen's Manuscript has increased enormously, with the result that we have regularly adjusted, refined and expanded our transcription.10 That process continues. A key strength of XML is the ease with which a DTD can be modified, allowing Attributes to be adjusted and additional Elements to be incorporated. When such changes are made, our documentation is kept up to date to show users how the new Elements and Attributes relate to the TEI P5 guidelines.
From its inception the Making of the Queen's Manuscript has been an open-source project, in keeping with the conditions set by the AHRC and best practices of scholarship. The project website includes a section on "Working with the XML Transcription Files" together with a "Readers' Guide" to the problems met in transcribing Harley 4431, and to the solutions adopted. The Guide describes the almost seventy Elements currently used in the XML transcription and the Attributes associated with them. [End Page 123]
That the Elements are so numerous is to be explained by the overlapping areas of inquiry—literary, art-historical, linguistic, codicological—opened up by Harley 4431. The decision to use XML to mark up the text, and thus to incorporate the knowledge and expertise of the academic team, was made because of the exciting prospect that the adoption of a standardized method of marking-up medieval manuscripts would mean that manuscripts could be shared and could be searched en masse for keywords (search strings). We hoped that hypertext files containing data and structured semantic content, stored on Internet servers, could be read by the scholarly community across the world. This prospect of sharing scholarly annotations in manuscripts from many international projects meant that an agreed way to encode and store knowledge associated with manuscript features had to be developed. The Oxford TEI (Text Encoding Initiative) team is active in this field, as is the Queen's Manuscript team, along with the group developing the Base de français médiéval at the École normale supérieure de Lyon,11 and the project centered on the Mystère des actes des apôtres.12 The net result of harmonizing the XML Elements also means that an interested researcher from one project can quickly understand the embedded annotations from the scholars engaged in other mark-up projects.
A well-formed DTD is fundamental to XML: it acts as a set of rules, decreeing which Elements and which associated Attributes are legal and setting their hierarchical relationship to one another. The Queen's manuscript DTD also serves to document our use of XML, as can be seen in the comment, set between "<!—" and "'-->", on the Element <vs>:
<!-- <vs>, 'very special', used to mark a particularly exuberant, elaborate or unusually formed letter, extended (generally into the upper or lower) margin. Values of the Attribute 'rend' include 'arabesque', 'banderole', 'commas', 'flattened', 'point' and 'trumpet'. -->
<!ELEMENT vs (#PCDATA | c | abbr)*>
rend CDATA #IMPLIED>
A short extract from the Epistre au dieu d'amours (DAMO), a courtly narrative poem, will show how the text has been transcribed and how the scholarly annotations made by the team are inserted into the XML file. Lines 214–17 are copied in manuscript 4431 as the first four lines [End Page 124] of column 52d.13 The Elements used in the transcription are: <lb>, line break; <vs>, exuberant letter; <note>, annotation by the transcriber; <rhyme>, rhyme-word; <abbr> abbreviation: and <l> line number. The Attributes used are: "n", number; "rend", rendered; "type", type.
<lb n="DAMO.052d:01"/><vs rend="arabesque">C</vs>e <note>Ce ? </note> tesmongne \ <vs>l</vs>'escript où je le <rhyme>lui</rhyme>
<lb n="DAMO.052d:02"/>De tieulx parleurs \ en y a à grans <rhyme>sommes</rhyme>
<lb n="DAMO.052d:03"/>Dont grant honte est \ tel vice en gentilz <rhyme>ho<abbr type="m"/>mes</rhyme><note>Correction over erasure? Last three words in lighter ink.</note><l n=" 216"/>
<lb n="DAMO.052d:04"/>Je di à ceulx \ qui en sont <rhyme>entechié </rhyme>
I. Mining the Data
The creation of the XML file, in itself an absorbing and challenging operation, is only a means to an end; it is not and cannot be an end in itself. Ways and means must be found to quarry the potentially rich seams of data embedded within the transcribed and annotated XML file. By creating a range of XSL transformation scripts the Queen's manuscript team has been able to exploit the XML file in a range of different ways. The main script, student.xsl, is intended to serve specialized and [End Page 125] non-specialized Anglophone and Francophone readers, whose interest in Christine's texts is literary above all. The script student.xsl generates a scholarly edition of Harley 4431, whether of an individual work or of the entire manuscript, as the reader chooses; it focuses on the text of the work(s) and ignores the codicological Elements in the XML file marking quire divisions, page and column breaks, running titles, catchwords, and signatures. More specialized XSL scripts include: glossary.xsl to generate a glossary from the words glossed (in both English and French) in the transcription; propernames.xsl and rime.xsl that respectively list the proper names and the rhyme-words tagged in the XML transcription. All these XSL transformation scripts have been made available as open-access source code on the project website.
When these XSL transformation scripts process the XML file, they create HTML pages suitable for web browsing. The Queen's manuscript project has made considerable progress towards its goal of creating a web edition to mirror the layout and the contents of the manuscript, column by column, while giving the user access, by means of mouse-clicks, to the notes incorporated by the team in the transcription. Working agreements have been established with the team responsible for the DMF,14 and with the Mystère des actes des apôtres project, of which Mansfield is also a member. This collaborative effort means that hypertext links direct to the DMF are added in the "Editions" section of the website to provide easy consultation of the online dictionary whilst the user browses the transcribed edition.
While the range of XSL scripts just described are successful in transforming the XML transcription file to readable web pages, the results are predetermined and cannot be processed further. Colleagues have told us that it would be helpful if the Queen's manuscript website became something more akin to a workshop,15 where they themselves could interrogate the XML transcription file. Although it would be possible to do so by writing additional XSL script(s), the programming skills and the time involved would discourage most researchers from following that route. Thus a scholar interested in knowing whether the Queen's manuscript contains examples of the ligature œ, used today in such words as cœur, œuvre, and sœur, would almost certainly be deterred from designing an XSL script to answer so specific a question.
Realizing the need for a different research tool to handle such detailed inquiries, Mansfield has written Loceme. This software differs from most search engines in that it allows the scholar to locate up to three keywords (search strings) simultaneously. Loceme has its own website with instructions on how to interrogate the Queen's manuscript, [End Page 126] that is set as the default corpus on the site (Mansfield, Loceme).16 The creation of a separate website means that searches can be made in Loceme while the images from the Queen's manuscript remain open on-screen in another browser page for cross-reference by the researcher. Numbering of every line in the transcription facilitates this cross-referencing. Loceme scans the entire XML transcription file, presenting its results in chronological order. The highlighted discoveries present the scholar with a graphical display, showing the distribution and the proximity of the search strings under investigation. Thus Loceme quickly shows that the spelling cuer occurs over 1,000 times, and that there is no example of cœur. What of œuvre? With the further help of Loceme it is seen that there is only one example of that spelling, in the Epistre Othéa, at line OTEA.095b:10, "Presentement ceste œuvre à rimoyer." Using the URL <http://www.pizan.lib.ed.ac.uk/gallery/pages/095r.htm>, users can scan the image of the line:
It is interesting to note that when, less than a hundred years later, L'Epitre d'Othéa à Hector, sive Les cent histoires de Troye, is printed in Paris by Philippe Pigouchet about 1501, the letters oe are not combined as a ligature. When results can be obtained interactively in this way, scholars are encouraged to pursue their own lines of inquiry. Loceme also makes the manuscript more accessible to a culturally aware public, interested in how French looked in the early fifteenth century. As such, it is an excellent interpretation tool opening up previously hidden aspects of the manuscript to a wider audience.
<lb n="CEBA.004a:20"/><fw type="rubric" place="laligned" rend="inset"><hi rend="red">Ci commencent Cent Balades •. ii.•</hi></fw>
<!-- 1 blank line -->
<lb n="CEBA.004a:22-23"/><group n=" 1"/><div2 n="1" type="Ballade" metre="10" length="28" refrain="1" stanza="ababbcbc" envoi="bcbc" rhyme="ffm"><hi rend="cap5">A </hi>ucunes gens me prïent que je <rhyme>face</rhyme><note> The large space reserved for the introductory lettrine made it impossible for the scribe to copy complete lines of verse alongside. The first four lines of Ballade 1 are copied in six ruled lines, as follows: 'Aucunes gens me prient que je /face • Quelxques beaulx dis et que /je leur envoye •• Et de dicter/dient que j'ay la grace • Maiz /sauve soit leur paix je ne sa-/roye •• Faire beaulx dis ne bons mais toutevoie'. The intercolumnar border has been adjusted round the last word of line 4.</note>
There are to date 7354 of these embedded notes. While they are searchable, they are primarily designed to complement the scholarly edition of Harley 4431 generated by student.xsl. Where a note has been inserted, an icon is displayed for the reader to click on in order to read it. The table below shows the size of the online corpus in its transcribed and annotated state.
The Annotated Transcription of Manuscript Harley 4431
Last page, column and line number: LAYD.398b:25
Lines of corpus: 62,855
File pointer at end of process: 2,697,268
Words including XML Elements and Attributes: 561,132
Characters, including spaces: 5,075,554
Disc storage required: 5,231,704 bytes
Total Project size: 3869 files, total 6,742,288,295 bytes. [End Page 128]
An understanding of the size of the annotated transcription, including the number of Elements and Attributes is of help to later users in designing online queries, using the software provided for the website. One of the key aims of the AHRC funding was to create an enhanced digital resource open and freely available to researchers to reuse both the resource and the search tools.
II. Digital Heritage Management
Digital heritage management has matured out of the work of museums, field research in archaeology, tourism destination management, and special collections in libraries (Meyer et al.), into more specific scholarly projects funded by national research bodies, including the AHRC's Resource Enhancement Scheme in the UK, that funded this project (Laidlaw and Mansfield) and the projets blancs funded by the ANR (Agence Nationale de la Recherche) in France (Mansfield and Smith). Meyer et al. and Hooland et al. point out two key considerations for such interpretive and technical work: (1) maintenance of public access to the digital resource once it has been created, which means open internet pages stored on maintained servers; (2) the collection and incorporation of new knowledge, generated by users who apply their individual expertise and skills as they engage with the resource. To this a third consideration can be added: the far-sighted, strategic design of coding systems that will facilitate later, unanticipated interpretations of the digital resource, coupled with the provision and development of software tools that allow users of all levels to interrogate and find new ways of interpreting the cultural artifacts that have been digitized (Dong et al.). One such recent, unanticipated outcome has been the use of this project's software to interrogate the account books of the town of Montferrand (Lodge).
III. Scribal Footprints
This section gives examples of how the tool can be used to yield new research insights on large corpora. With the help of Loceme, a search of the XML transcription for particular word strings yields interesting patterns. Figures 3 and 4 show where the spellings doulx / doulz and peut / puet are to be found in the manuscript. Comparison of the two Figures strongly suggests that the Cent balades d'amant et de dame (CBAD) and the Lay de dame (LAYD), that together make up the last fascicule of the manuscript, were not copied by X but by a different scribe; the footprints are different.
[End Page 130]
Following the introduction of PHP version 5.1.0, the function "simplexml_load_file" has allowed an XML file to be converted into an object, a development that offers exciting possibilities for the Queen's manuscript project and for the exploitation of well-formed XML transcriptions. A key advantage of Loceme is that it treats XML Elements as text. However, the obligatory chevrons delimiting each XML Element interfere with the rendering of the HTML display. In formulating search strings, the researcher must take care to omit the chevrons. Thus, to find the starting point of quire (gathering) 10, the TEI P5 Element <gb n="10"> must be simplified to become gb n="10". Users may perform this search at http://eserve.org.uk/loceme/loci.htm, choosing file "xml" rather than "4431." To associate these quires with a feature of the transcribed [End Page 131] text the researcher must add a second search string; this may be a word string or part of an XML Element.17
Reference was made earlier to exuberant letter forms that are marked by the Element <vs> in the XML transcription. Thanks to the research of Mark Aussems, their significance is now better understood:
…the "decorated" letter forms (or cadeaux) which appear frequently in Christine's supervised manuscripts and contain exuberant ascenders (for characters on the top line) or descenders (for letter forms on the bottom line) are indeed the work of the scribes and may provide valuable information to distinguish between scribal hands in Christine's manuscripts: different scribes may use different forms of pen decoration, some more frequently than others.
With the help of Loceme we can ask whether each of the 53 quires in Harley 4431 contains examples of these exuberant letter forms or whether they occur only in certain quires. (As was seen previously, the Element used to mark a quire (gathering) is <gb>, "gathering begins." By choosing "gb n=" as the first search string and "vs" as the second string, we can discover where the second string occurs in association with the first.
The results show a pattern: the exuberant letters are not found throughout Harley 4431. They cluster in some sequences of quires and are absent elsewhere. Here, we think, is further prima facie evidence that the Queen's manuscript was copied by more than one scribe. Although Figures 3 and 4 provide some evidence for the analysis of hands in CBAD, more work needs to be done. Our quarry, in both senses of the word, will assuredly yield further results.
[End Page 132]
The diachronic approach to studying the French language is based on the language changing over time; for example, spelling traditions might change at a certain point. Researchers may consider the language as entropic or simply as changing, but both schools agree that a spelling convention may appear at a point in history and then survive for a given period. If researchers are certain from other evidence that an extensive corpus begins at one particular date and runs chronologically to the end of that corpus with no insertions nor later amendments, then that single corpus can be examined for spelling shifts or for changes in letter forms. Consider the example discovered in this project where the distribution of the spellings doulx and doulz in Harley 4431 shows that the last fascicule of the manuscript consistently has a spelling of doulz. Different research paradigms will offer different explanations of this observed phenomenon. For example, abductive inference would lead researchers using an interpretive paradigm to suggest that either the scribe has adopted a new spelling variant later in their working lives because it is at the end of the manuscript, or that a different scribe has stepped in to work on that final fascicule. However, to make deductive explanations using data supplied by the computer researchers must exercise far more caution, and can only say with any certainty that the spelling has changed, nothing more, and reserve any inference concerning the scribe for debate rather than as an assertion. Interpretations on the role of the scribes are enriched beyond the data supplied by the computer by researchers from other research paradigms. Phenomenologists or experiential archaeologists, for instance, have much to offer on the material practices of copying letter forms over an extended period of time.
1. Since the Duke's manuscript had been prepared in fascicules, it was an easy matter to rebind it as five separate volumes. See Laidlaw, "Christine de Pizan". We also refer to the introduction to this volume.
5. The Cité des Dames, a prose treatise, forms the ninth fascicule (ff. 290–375) of the Queen's manuscript. Miniatures by the Master of the Cité des Dames are found in at least ten other manuscripts of works by Christine, prepared under her supervision; see Ouy et al. 746. [End Page 133]
17. A complete guide to the Oxford TEI P5 Elements is available at: <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/REF-ELEMENTS.html>.
Chantilly, Musée Condé, MS 492–93.
London, British Library, MS Harley 4431. (The Queen's MS) <http://www.pizan.lib.ed.ac.uk>
Paris, Bibliothèque nationale de France, MS fr. 12779. <http://gallica.bnf.fr/ark:/12148/btv1b60001038.r=?rk=21459;2>
Paris, Bibliothèque nationale de France, MS fr. 848. <http://gallica.bnf.fr/ark:/12148/btv1b9007146t.r=?rk=21459;2>
Paris, Bibliothèque nationale de France, MS fr. 836. <http://pizanmanuscripts.org/#book;Francais836>
Paris, Bibliothèque nationale de France, MSS fr. 835, 606, 836, 605, 607. (The Duke's MS). <http://pizanmanuscripts.org/#book;Francais835> <http://pizanmanuscripts.org/#book;Francais606> <http://pizanmanuscripts.org/#book;Francais836> <http://pizanmanuscripts.org/#book;Francais605> <http://pizanmanuscripts.org/#book;Francais607>