In lieu of an abstract, here is a brief excerpt of the content:

The Moving Image 3.2 (2003) 96-100



[Access article in PDF]

Looking at Preservation from the Digital Library Perspective

Carl Fleischhauer


My perspective represents the domain of digital libraries. The term "digital library" is sometimes used to name the great array of digital content that exists in the world or on the World Wide Web—the speakers see these bodies of information as libraries. My colleagues and I, however, use the term to refer to organized entities, public and private, that exist to collect, provide access to, and especially preserve content in digital form. The final verb is very much on our minds at the Library of Congress, where the new National Digital Information Infrastructure Preservation Program (NDIIPP) has content preservation as its focus.

The problem of preserving content in digital form has received widespread attention in the digital library community in recent years, starting in the early 1990s. The initial problem statements employed examples like old five-and-a-quarter-inch floppy disks that cannot be played in the three-and-a-half-inch drives found in newer computers, or highlighted problems like trying to read WordStar documents in a Microsoft Word environment. One of the most carefully developed and widely read problem statements was Jeff Rothenberg's article "Ensuring the Longevity of Digital Documents." 1

Rothenberg argued that the obsolescence of computer hardware and software means that, after time has passed, it will be challenging to merely retrieve digital bits from old media. But a greater challenge, he wrote, is the interpretation of the bitstream. "This task is not straightforward....interpreting the bitstream depends upon understanding its implicit structure, which cannot explicitly be represented in the stream." 2 The software used to read a file at the time of its creation will not be available years later, and when a file is examined in replacement software its look and feel (and very likely meaning) will have changed. Similar changes will occur when an old file is migrated into a new format. These "flaws in translation," Rothenberg argues, are inevitable and will change content, sometimes in damaging ways. The article concludes by recommending that bitstreams "be sealed in virtual envelopes" with contextual information that describes the content and its "transformation history."

The digital library community tends to see the preservation of digital content as a new problem. But as moving image and recorded sound archivists know all too well, the issues of format volatility and the risk of sheer loss are of long standing. Analog magnetic recordings are poster children for the problems of format volatility and the risk of loss that characterize all electronic content, not just digital. The brows of audiovisual archivists sprouted worry lines in the 1960s and 1970s, twenty or more years before they appeared on the foreheads of digital librarians.

Volatility and loss are two facets of the problem of preserving content in electronic or digital form. And as Rothenberg pointed out, another important facet is the risk of losing interpretability, or renderability. In the digital realm, interpretability pertains to reading the bitstream. We can keep a bitstream alive very well by copying it repeatedly—done properly, no bits will be lost. But after time passes, will we be able to understand what the bits are saying? (What do I do with my WordStar documents when that software is no longer available?) An analogous problem in the audiovisual realm is represented by videotape. Even though the video "signal," in the sense of what travels along the cable between devices, is standardized, [End Page 96] each type of video recorder has a different way of encoding the signal and laying it on the tape, depending on the manufacturer's preferred head configuration, proprietary internal data compression schemes, and so on. The audiovisual community has been wrestling with a "rendering" problem for many years.

Solution statements started emerging from the digital library community in the early 2000s. Calls for action emerged on multiple fronts—policy and political, organizational, and technical. The technical proposals operate at different levels or in different categories, but they generally are based on the distinction between...

pdf

Share