- HathiTrust Libraries Map a Shared Path:A Turning Point in Information Access
In October 2011, a constitutional convention established a governing structure, mission, and goals for the HathiTrust digital repository.1 Several of us from the portal editorial board were present, along with 130 representatives from 64 partner institutions in three countries. In one sense it was simply another of the many organizational meetings that are held every year in higher education and information technology and related areas, in which we form consortia, undertake joint projects, and strategize responses to trends. This meeting and this organization, however, may lay the foundation for a fundamentally different world for libraries, publishing, and information access.
What became HathiTrust started in late 2006 as a proposal from the University of Michigan to its sister libraries in the Committee on Institutional Cooperation (CIC), to operate a shared digital repository to archive the large files that would be generated as the CIC libraries contracted with Google to digitize portions of their book collections. Nobody wanted to see a wasteful duplication of server architecture to store all those files. Over the course of the following year the preliminary functional objectives, collaborative principles, and the business and legal models were refined. By June 2008, when the name HathiTrust was coined, the repository already held the digital files for more than one million volumes; in October of that year, the University of California system joined forces with the CIC and all of a sudden it looked like this enterprise was going to have real legs. Paul Courant and John Price Wilkin at the University of Michigan deserve special credit for their vision and initiative, the University of Indiana for quickly partnering to architect a mirror site, and all of the CIC library directors for reorienting their thinking quickly from supporting a straightforward digital preservation archive to a dynamic multi-functional platform that would soon occupy a unique niche in the information landscape.
From the beginning, major aspects of the service were under development - quality control, public search interfaces, ingest of non-Google and non-book content, access for persons with disabilities, collection grouping, data mining, and other academic research tools. Value statements affirmed that "We believe the digital collections created [End Page 1] through programs of mass digitization should be broadly and freely available to the extent possible by law or contractual agreements."2 Complex issues of collection planning, digital object management, and long-term preservation rapidly began to be addressed by various committees.3 In a surprisingly short period of time, almost all of the early objectives4 have been met such that the Constitutional Convention began to outline some major future directions.
The emergence of HathiTrust has been propelled by, but is not dependent on, the Google book digitizing project. There is still a morass of legal argument surrounding Google's rights to do the scanning at all, and disagreement over the fair use arguments for access to those files. Even as these wars are waged in the courts, however, publishers and distributors have shaped new approaches to electronic dissemination and rights management that are designed to expand and interlink digital platforms. HathiTrust is a distinct nonprofit entity whose infrastructure is envisioned to embrace digital content generated by many sources beyond Google, for example, born-digital files contributed by academic institutions, and opt-in content from university presses. The bulk of public domain materials in the repository keeps growing, with materials that can be openly searched and viewed. Demand is leaning in favor of more access; at the Constitutional Convention it was announced that HathiTrust had received only twenty requests from authors to take down specific titles, and more than 5000 requests from rights-holders to open up their files.
The next level of academic service from HathiTrust has just been launched in the HathiTrust Research Center (http://www.hathitrust-research.org/), which will enable computational access for nonprofit and educational users to works in the public domain. Significant advances have occurred in every discipline, using tools that allow researchers to mine hundreds of thousands of texts at once, searching for words and patterns, synchronic and diachronic contrasts, and interconnections that reveal historical...