In lieu of an abstract, here is a brief excerpt of the content:

101 9 ISSUES IN BUSINESS PLANNING FOR ARCHIVAL COLLECTIONS OF WEB MATERIALS Paul Koerbin The state of the art Introducing the subject The World Wide Web (the ‘web’) is such a pervasive part of our working and personal lives that it presents unprecedented challenges for libraries to collect and manage for preservation. We are more involved with the Web every day in a way in which we have not been with other information materials; it is at once seemingly omnipresent and elusive. Confronted with the idea of web archiving an obvious first question to ask is ‘how do we do it?’. However, while the ‘how’ is certainly a challenge it cannot be usefully addressed without first confronting what are the rather more difficult questions of ‘what to archive’ and ‘why?’. We all have some concept of the web, but what do we really mean? The web is subject to all the conveniences (and excesses) of labels and jargon terminology. Even ostensibly straightforward labels can conceal a deeper complexity when we need to work practically with such entities. What, for example, is an ‘online newspaper’ exactly? Is it the news under the masthead, the blogs, the videos, the portal, the commercial services and advertisements which may be associated with the site? Perhaps all or some of these things, depending on one’s purpose or perspective. One problem we face, then, is what level of specificity we need in order to talk meaningfully about the web. Such questions are not merely rhetorical if we are to address the primary concern of what we perceive the web to be for the purpose of web archiving. Not only do we need to understand the nature of the content and the format (or formats), but we also need to deal with the dilemmas posed by the characteristics which make the web what it is: its dynamic quality that makes the dimension of time a profound challenge for collecting institutions; and the un-mediated and non-discrete nature in which it manifests itself through re-use, linking, embedding, feeds, API technologies and so on. What is web archiving? ‘Web archiving’ is the generally accepted term for the activity of collecting web delivered materials for a digital repository with the intended purpose of long-term preservation. Other terms such as ‘Internet archiving’ or ‘Internet preservation’ may be understood as synonymous; though all these terms should be understood as narrower than ‘digital preservation’, which may encompass more than born digital web material. In addition to BPDG_opmaak_12072010.indd 101 13/07/10 11:51 Paul Koerbin 102 the imprecise nature of the term ‘web’, ‘archiving’ is also problematic in some respects, particularly for libraries since the term ‘archive’, which is implied by ‘web archiving’, can suggest a legal purpose associated with national or state archives which the library-based web archive does not necessarily fulfil. Nevertheless the term does have the advantage of implying purposeful preservation which a term such as ‘collection’ may not. Web archiving is further defined by how the process is undertaken. The content of the archive is obtained through the use of harvest (or crawl) robots. The robot acts much like a web browser, and so what is collected is a static rendering of content – a browser view – as delivered by web servers. While it may be possible for content to be ingested by other means such as deposit, the web archive, certainly for the purpose of this discussion, is to be distinguished from institutional digital repositories in which curators or creators submit packages of content to the repository. Web archiving should also be understood as a workflow process involving selection (or scoping), collection (ingest), metadata creation and administration, data storage, preservation and access (delivery). The Reference Model for an Open Archival Information System (OAIS), published by the Consultative Committee for Space Data Systems in 2002, has been widely adopted as the basis for developing these workflows. Who is involved? As Brown (2006: 8) has noted, the history of web archiving is almost as long as that of the web itself. The early pioneers included the national libraries of Sweden and Australia, both commencing web archiving programmes in 1996, and the Internet Archive, also founded in 1996. National and state libraries with deposit library mandates continue to be the leading library institutions involved with web archiving. In recent times there has been a growing interest from universities with an interest in developing collections of web resources to support research analysis, research groups such LiWA (Living...


Additional Information

Related ISBN
MARC Record
Launched on MUSE
Open Access
Back To Top

This website uses cookies to ensure you get the best experience on our website. Without cookies your experience may not be seamless.