Abstract

The electronic conversion of scanned image files to readable text using optical character recognition (OCR) software and the subsequent migration of raw OCR text to e-book text file formats are key remediation or media conversion technologies used in digital repository e-book production. Despite real progress, the OCR problem of reliability and accuracy in OCR-derived e-book text and metadata persists. This paper examines a selection of digitized e-books in several prominent digital repositories and discusses the impact of OCR technology on e-book text file formats, metadata, and the online reading experience.

pdf

Additional Information

ISSN
1530-7131
Print ISSN
1531-2542
Pages
pp. 59-91
Launched on MUSE
2015-01-27
Open Access
No
Back To Top

This website uses cookies to ensure you get the best experience on our website. Without cookies your experience may not be seamless.