- Linked Data for Libraries, Archives and Museums: How to Clean, Link and Publish Your Metadata by Seth van Hooland and Ruben Verborgh
Linked Data, a method of publishing and structuring data from different sources so that the data can be interlinked to make them more useful, is perhaps the most tangible and practical branch of the Semantic Web movement. Tim Berners-Lee, the inventor of the World Wide Web, also coined the term Semantic Web, which he defined in a 2001 Scientific American article as “an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” So far, Linked Data and the Semantic Web have remained inaccessible to much of the library and information science (LIS) community. Seth van Hooland and Ruben Verborgh make a welcome attempt to introduce these concepts to LIS professionals with Linked Data for Libraries, Archives and Museums: How to Clean, Link and Publish Your Metadata. In addition to a passion for their subject, both authors have academic and practical technical experience with metadata, the data that describe other data. This handbook aims at lowering barriers for the adoption of Linked Data in the LIS community. Each of the seven chapters includes hands-on exercises. The subtitle, How to Clean, Link and Publish Your Metadata, highlights the organizational structure for this book’s contents as well as a process for enriching and sharing existing institutional metadata using the principles of Linked Data.
Given the technology-laden evolution of Linked Data, the current literature remains highly specialized, even though a wider mobilization of Linked Data depends on its adoption by communities outside the discipline of computer science. The authors make no secret of this critical fact in Chapter 1, defining Linked Data as a “set of best practices for the publication of structured data on the web” rather than a specific technology. (p. 3) Throughout Linked Data for Libraries, Archives and Museums, the authors present complementary perspectives from both computer science and LIS. Chapter 2 (“Modeling”) reviews the history and evolution of information modeling, the organization of content so that it can be delivered and reused in a variety of ways, and traces the historical parallels between the computer science and LIS communities. Metadata managers and practitioners will gain an appreciation of Linked Data as an approach for linking metadata across collections, systems, and institutions over the Web in an open and transparent manner. In Chapter 3 (“Cleaning”), which deals with metadata quality, the authors explain, “All metadata is dirty, but you can do something about it.” They argue that the quality of metadata is not only subjective but also contextual and relative to the needs of local users. (p. 71) The premise of Linked Data is to repur-pose and share local metadata and make it available to others globally in unknown and unpredictable contexts.
Chapter 4 (“Reconciling”) presents an overview of controlled vocabularies—established lists of standardized terminology, such as library subject headings, for use [End Page 444] in the indexing and retrieval of information—and the evolution and resurgence of controlled vocabularies in the context of Linked Data. The authors stress that “the failure of full-blown ontologies has created a new opportunity for controlled vocabularies in the context of linked data.” (p. 136) LIS metadata practitioners will be elated to learn that computer professionals have finally come around to admitting the value of controlled vocabularies. Chapter 5 (“Enriching”) covers named-entity recognition (NER), the classification of text elements into predefined categories, such as the names of persons, organizations, or locations, and the best practices to enrich lengthy and unstructured data with Linked Data principles. Using Linked Data for library archives and museum metadata, NER services are an efficient way to both identify and disambiguate terms and entities within unstructured data. This chapter includes a discussion of the confusing differences between uniform resource identifiers (URIs), which identify resources, and uniform resource locators...