- Etexts and Archives
A consideration of the development of digital resources in Victorian Studies, and how these might influence the research of the future, raises more questions and points for clarification than it provides answers. We might therefore begin with a list of more generic observations that might be divided into two: those related to ICT and those related to academic studies (and in Victorian Studies, we immediately come up with the series of discipline differences: the way art history, literary criticism, or historiography might make use of ICT will vary enormously – my background is in literary criticism and most that I have to say will relate to that subject). Digitisation is a specialist field in its own right; non-specialists will most likely interact with it in two ways, as users of resources or as part of teams engaging in multi- or inter-disciplinary research. ICT in Humanities research might also loosely be split into two: the use of internet-based platforms to host resources of different kinds – e-texts, hypertexts, visual resources, searchable databases – and the use of computers to analyse and interrogate materials – usually in the form of a database of some kind. Hence, we start with a question of what we mean by 'digitisation' and how this actually relates to our research: at the moment, most 'digital' activity is focused on the creation of resources and any overt research element usually considers how a resource might advance upon the simple practice of making available a text as a document to be read (an e-text).
I say 'simple practice', but of course, this in itself is not without complexities: an e-text could be a Word document, a .txt document, a pdf, or web-mounted file with HTML or XML mark-up. You might have to download the whole text and read it as a linear document scrolling down through the pages; or it might be web-readable and require you to click to turn pages or move from chapter to chapter or poem to poem. It might be searchable, and if it is, it might be searchable for keywords only, or for specific sets of pre-marked generic categories or terms, or both. In editorial terms, as a literary product, it might bear no trace of its origins as a published text and not inform you of when it was published or what edition it is drawn from. Or it [End Page 88] might contain all the publishing history you could desire and draw your attention to variants and revisions. The vast majority of scholars know how to read the signs of a publishing history in a text when we have it as a published book on our desk; but far fewer know how to determine the provenance and status of an e-text. Our main engagement with digitisation at present is to transfer materials that we have in printed form into electronically-based formats, and the variety of formats that could be used is the main focus of our research. What format we choose will determine the kinds of questions a scholar can subsequently demand of the text.
Creating an e-text is far harder and much more complex than most people assume. Those of us who have experience of scanning a word document via a flatbed scanner will perhaps think this a strange assertion – it is almost like making a photocopy these days: an image can be scanned, resized, and put up on our university VLE in minutes. A text can also be posted as an image file, but using a basic OCR package and turning it into an e-text, is also pretty simple. If this proves fiddly, and the text is not long, typing it into Word and then uploading it is always an option. However, these kind of resources are deeply problematic and not the way to create extensive academic archives. Sustainability is a key concept here. It may be fine to post a document in Word now, but in 50 years' time, will it still be readable? If it's short, you can do it again with whatever is the norm in 50 years' time, but...