Loose, Falling Characters and Sentences: The Persistence of the OCR Problem in Digital Repository E-Books

Diana Kichuk

doi:10.1353/pla.2015.0005

portal: Libraries and the Academy

Loose, Falling Characters and Sentences: The Persistence of the OCR Problem in Digital Repository E-Books
Diana Kichuk
portal: Libraries and the Academy
Johns Hopkins University Press
Volume 15, Number 1, January 2015
pp. 59-91
10.1353/pla.2015.0005
Article
- View Citation
- Related Content
Additional Information

Purchase/rental options available:
- Buy Issue for $25 at JHUP

Abstract

The electronic conversion of scanned image files to readable text using optical character recognition (OCR) software and the subsequent migration of raw OCR text to e-book text file formats are key remediation or media conversion technologies used in digital repository e-book production. Despite real progress, the OCR problem of reliability and accuracy in OCR-derived e-book text and metadata persists. This paper examines a selection of digitized e-books in several prominent digital repositories and discusses the impact of OCR technology on e-book text file formats, metadata, and the online reading experience.

collapse

You are not currently authenticated.

If you would like to authenticate using a different subscribed institution or have your own login and password to Project MUSE

Authenticate

Purchase/rental options available:
- Buy Issue for $25 at JHUP

portal: Libraries and the Academy

Loose, Falling Characters and Sentences: The Persistence of the OCR Problem in Digital Repository E-Books

Share

Additional Information

Project MUSE Mission