British Broadcasting Corporation. Information and Archives Dept.
Digital preservation -- Great Britain.
The amount of digital content produced at academic research institutions is large, and libraries and archives at these institutions have a responsibility to bring this digital material under curatorial control in order to manage and preserve it over time. But this is a daunting task with few proven models, requiring new technology, policies, procedures, core staff competencies, and cost models. The MIT Libraries are working with the DSpace™ open-source digital repository platform to explore the problem of capturing research and teaching material in any digital format and preserving it over time. By collaborating on this problem with other research institutions using the DSpace platform in the United States, the United Kingdom, Europe, and other parts of the world, as well as with other important efforts in the digital preservation arena, we are beginning to see ways of managing arbitrary digital content that might make digital preservation an achievable goal.
Research journals are increasingly being published digitally. The advantage of digital publishing is obvious: immediate accessibility anywhere. Gradually a disadvantage is also becoming clear: digital publishing endangers the continuity of research information. As a consequence of the obsolescence of formats, hardware, software, and carriers, digital information will be lost unless we act. Digital publishing is also causing a shift in the roles and responsibilities of publishers and libraries concerned with archiving digital publications for future use. Archiving digital publications requires a major turnaround in the policy and practice of national libraries. Although some actions have been taken, digital preservation research and implementation are still in their infancy. National libraries will need substantial funding for venture research activities and development of archival infrastructures. They will also have to work together more closely to successfully organize digital archiving in the twenty-first century.
A primary role of national libraries is to document the published output of their respective countries. Traditionally, this has meant collecting, describing, and preserving for future generations at least one copy of every item published in print, including books, serials, newspapers, maps, music, posters, and pamphlets. In the last decade, online publishing has had a revolutionary impact on the creation, publication (dissemination), and use of information. This has presented libraries, particularly national (deposit) libraries and other cultural collecting institutions, with the daunting task of collecting, storing, describing, managing, and preserving the vast quantities of information that are being produced online.
The Web is a virtually infinite information space, and archiving its entirety, all its aspects, is a utopia. The volume of information presents a challenge, but it is neither the only nor the most limiting factor given the continuous drop in storage device costs. Significant challenges lie in the management and technical issues of the location and collection of Web sites. As a consequence of this, archiving the Web is a task that no single institution can carry out alone. This article will present various approaches undertaken today by different institutions; it will discuss their focuses, strengths, and limits, as well as a model for appraisal and identifying potential complementary aspects amongst them. A comparison for discovery accuracy is presented between the snapshot approach done by the Internet Archive (IA) and the event-based collection done by the Bibliothèque Nationale de France (BNF) in 2002 for the presidential and parliamentary elections. The balanced conclusion of this comparison allows for identification of future direction for improvement of the former approach.
Development of approaches to preservation metadata has been an integral component of international efforts in the field of digital preservation. The focus of the community engaged in this work is currently shifting, and there is, as yet, no formal agreement around a conceptual framework and identification of required data elements. At the same time attention is now turning to the more complex task of building sustainable technical, infrastructure, and policy frameworks that will enable organizations to implement preservation metadata strategies practically at a local level.
In 2003 the Online Computer Library Center (OCLC) and Research Libraries Group (RLG) established an international working group to develop a common, implementable core set of metadata elements for digital preservation. Most published specifications for preservation-related metadata are either implementation specific or broadly theoretical. PREMIS (Preservation Metadata: Implementation Strategies) was charged to define a set of semantic units that are implementation independent, practically oriented, and likely to be needed by most preservation repositories. The semantic units will be represented in a data dictionary and in a METS-compatible XML schema. In the course of this work, the group also developed a glossary of terms and concepts, a data model, and a typology of relationships. Existing preservation repositories were surveyed about their architectural models and metadata practices, and some attempt was made to identify best practices. This article outlines the history and methods of the PREMIS Working Group and describes its deliverables. It explains major assumptions and decisions made by the group and examines some of the more difficult issues encountered.
Detailed knowledge of the internal properties of digital representation formats is necessary to interpret properly the full information content of otherwise opaque digital objects. These properties form an important component of the representation information needed by repository workflows regardless of local preservation strategy and infrastructure decisions. The Digital Library Federation (DLF) has sponsored preliminary investigations toward establishing a Global Digital Format Registry (GDFR) that will function as a sustainable utility for maintaining the bindings between public identifiers for digital formats and the significant syntactic and semantic properties of those formats. A sustainable GDFR should prove to be of great utility to archives, libraries, digital repositories, and other organizations and individuals interested in the long-term viability of digital assets.
Archival materials -- Digitization -- California -- San Diego.
Digital preservation -- California -- San Diego.
Electronic public records -- California -- San Diego -- Management.
Information superhighway -- California -- San Diego.
San Diego Supercomputer Center.
The Persistent Archive Testbed and National Archives and Records Administration (NARA) research prototype persistent archive are examples of preservation environments. Both projects are using data grids to implement data management infrastructure that can manage technology evolution. Data grids are software systems that provide persistent names to digital entities, manage data that are distributed across multiple types of storage systems, and provide support for preservation metadata. A persistent archive federates multiple data grids to provide the fault tolerance and disaster recovery mechanisms essential for long-term preservation. The capabilities of the prototype persistent archives will be presented, along with examples of how the capabilities are used to support the preservation of email, Web crawls, office products, image collections, and electronic records.
National Digital Information Infrastructure and Preservation Program (U.S.)
Library materials -- Digitization -- United States.
Digital preservation -- United States.
Congress authorized the Library of Congress to undertake the National Digital Information Infrastructure and Preservation Program (NDIIPP) to prevent the loss of our digital heritage. This work, as with all digital preservation activities, is challenging because of technical issues and also because traditionally there have been few effective collaborative mechanisms to leverage resources and expertise. NDIIPP aims to address both issues while also ensuring the preservation of at-risk digital content. Concrete steps have been taken recently with the establishment of eight partnership consortia, each of which has committed to working with the other and the Library on collaborative digital preservation initiatives. The eight consortia represent the formal launch of an NDIIPP national network of preservation partners. Currently, NDIIPP is exploring how best to involve states and territories in the network.