[Access article in PDF]
Digital Representation and the Text Model*
Does the digital representation of the text demand the definition of a model? In and of itself, every representation and, consequently, every form of text representation entails the implicit or explicit assumption of a model, at least if we accept the postulate that the "map is not the territory." 1 So, the conventional image of a text, handwritten or typed, is itself a text model. The same thing can be said, then, of its digital representation or, to be more precise, of every form of its digital representation, regardless of its specific kind. The problem of the model presents itself, therefore, as a problem of adequacy with respect to the conventional model and representation.
With respect to the conventional representation, an adequate digital representation should in no way impoverish the informative content of the text. If the digital text representation is not original, that is, if we consider a reproduction as opposed to a text produced directly in digital form, the first fundamental criterion for adequacy is constituted by the exhaustivity of the representation. In order to obtain the exhaustivity of the representation, markup is usually resorted to. In fact, textual information in machine-readable form is represented primarily by way of the binary coding of the characters: in this way, the "text" is conceived, from a computational point of view, as a type of data and the treatment of the text, that is, the "storage and processing of textual material," comes to consist in the treatment of "information coded as characters or sequences of characters." 2 It is evident, however, that the computational notion of the text as a type of data does not coincide with the notion of the text as a product of literary activity. The pure and simple character sequence is not adequate enough to represent all of the information contained in the "literary material as originally written by an author" (TP 1). Hence the need to furnish additional information by way of embedding markers defined by a given markup language. [End Page 61]
A second criterion for adequacy, equally important, concerns the liability of the digital representation to automatic processing and its functionality with respect to the critical operations of reconstructing or interpreting the text. For particular analytical purposes, the digital text representation may provide distinct advantages and be preferable to conventional representation itself. For example, the possibility of combining digital images with transcripts of the text renders the mimetic function of diplomatic transcripts superfluous and modifies its purposes. Combined with an image, a diplomatic transcript no longer serves to "reproduce the original," but rather to extract information from it and to represent it in an automatically processable form. In this light, the diacritical signs or the forms of markup are no longer conceived as an aid in visibly reconstructing an absent document, but rather as a means of "modelling" the physical and textual information contained in the original for the purpose of further processing. 3 The image itself, to the extent that it is a digital representation of visual information, does not provide merely a "facsimile" or "physical reproduction" of the original, but rather a set of "structured data," that is, a "logical representation" of the document's contents. 4
The structure of the digital representation becomes very important with regard to the conditions of adequacy, because it imposes precise conditions upon the procedures used in the automatic processing of the informational content of the document. The representation's form must serve the analytical operations necessary to the study of the text. Even the form of the conventional text representation poses some problems (particular typographic stratagems have been proposed, for example, for the critical edition of texts handed down by a fluid or very complex tradition) 5 and the same problems occur, with equal if not greater prominence, with digital text representations. At all events, no form that sets fundamental limits on any analytical and scholarly operation can be considered a suitable form of representation. An adequate digital text...