In lieu of an abstract, here is a brief excerpt of the content:

2. Capturing Data in Electronic Form We are still far from being a paperless society. Haynes (1998,128) Before you can analyse a text it needs to be in a format in which the computer can recognise it, usually in the format of a standard text file on a storage medium such as a floppy disk or a hard disk. If the text is only currently available in printed or written form this file must be created, since it is not normally convenient, or indeed possible, for the computer to read the data directly from the paper copy. Barnbrook (1996,30) In order for a translator to take advantage of specialized translation technology, the texts to be processed must be in electronic form. It is increasingly common for clients to supply source texts to translators in electronic form (e.g., prepared using a word processor or desktop-publishing package), just as it is increasingly common for translators to be able to find parallel documents in electronic form (e.g., on the World Wide Webor on a CD-ROM).Nevertheless, this is not always the case. A survey undertaken by Webb (1998) reveals that freelance translators still receive approximately 45 percent, and translation agencies approximately 15 percent, of their source texts in hard copy. In addition , translators may wish to convert texts into electronic form in order to compile corpora (see section 3.1), prepare for automatic term extraction (see section 4.4), or build a translation memory (see section 5.2). In any case, when data are not currently machine-readable (i.e., a document arrives as a fax or a printout), they must be converted. There Capturing Data in Electronic Form 23 are two main ways of doing this: using a combination of scanning hardware and optical-character-recognitionsoftware and using voicerecognition technology. In addition, even when data arrive in or have been converted to electronic form, they will not necessarily be in aformat that is compatible with the technology that the translator wishes to use. Therefore, it may be necessary to convert these data into the appropriate file format. In this chapter, these technologies will be explained, and the advantages and disadvantages associated with each will be outlined. The purpose of this book is not to overwhelm translators with technical details, but to give them an appreciation of the general approaches used by machines and of the types of difficulties that can be encountered when using technology. Byreflecting on the differences between the ways in which humans and machines process data, translators will learn to have more reasonable expectations with regard to the capabilities of technology and how they can interact with machines in order to improve the quality of the machine output. 2.1 Scanning and optical character recognition One method for converting a hard copy of a text into an electroniccopy is to use a piece of hardware called a scanner combined with opticalcharacter -recognition (OCR)software. For the sake of clarity, scanning and OCRwill be explained as separate processes; however, in practice, the two can work together in a way that appears seamless to the user. 2.1.1 Scanning A scanner is a computer peripheral, and there are a variety of models available. These range from small handheld devices, through desktop paper-sized scanners, to large freestanding units. Handheld scanners are small and lightweight. A user holds the scanner in his or her hand and drags it across the paper. Typically, handheld scanners have a limited scanning window (usually 3 or 4 inches). This means that a user would have to make two or three passes over an ordinary letter-sized page. In addition, the user must have a very steady hand. Handheld scanners are relatively inexpensive, but they are typically too slow and inaccurate to be useful for large amounts of document conversion. At the other end of the scale, large freestanding units are capable of [3.144.102.239] Project MUSE (2024-04-19 23:01 GMT) 24 Computer-Aided TranslationTechnology Figure 2.1 A text divided into large pixels. (Normally, the pixels used by a scanner would be much smaller, often on the order of several hundred or even several thousand pixels per square inch.) processing vast amounts of data quicklyand accurately.However, they are quite large and relatively expensive, and they are therefore beyond the means of many translators. Generally, such machines are used by large institutions that have significant scanning needs (e.g., libraries, archives). In between these two...

Share