In lieu of an abstract, here is a brief excerpt of the content:

Automatic Speech Recognition and Its Applications Harry Levitt 11 RECENT ADVANCES in automatic speech recognition (ASR) technology have been dramatic. With the help of a laptop computer and appropriate programs , an ASR device can recognize as many as 30,000 words. Although such progress is impressive, ASR technology has limitations. ASR devices that can handle large vocabularies (vocabularies of tens of thousands of words) are limited to discrete word recognition; that is, the speaker must inject short pauses between words. The process requires a cooperative speaker, and, in order to minimize machine recognition errors, it is usually necessary to train the ASR device on each speaker. Some of the more advanced ASR devices work quite well when trained on a given accent, but the best results are obtained when each individual speaker trains with the device. Under ideal conditions, speech recognition error rates are on the order of 1 percent, that is, without background noise for a cooperative speaker who has spent some time training, the ASR device inaccurately recognizes about one word in a hundred. Ideal conditions do not always prevail in practice, but there are many practical situations in which the device maintains error rates well under 5 percent. Much of the research in ASR is now directed toward developing systems for continuous speech. Automatic recognition of continuous speech is possible , but the vocabulary size for a reliable system is much smaller than that for discrete word recognition. Research in ASR is driven largely by its potential application in commerce and industry. There are, however, noncommercial applications of ASR technology that may have a significant impact on other aspects of our highly computerized society. This chapter addresses possible applications of ASR technology for improving the means of communication between deaf and hearing people, as well as providing new tools for the education of deaf children. Preparation of this paper was supported by Grant #H133E80015, a Rehabilitative Engineering Research Center on Hearing Enhancement and Assistive Devices, from the National Institute on Disability and Rehabilitation Research (NIDRR). 133 134 HARRY LEVITT Hearing Person Deaf Person Telephone TTY Speech Speech Recognizer Synthesizer Figure 11.1. Voice-operated teletypewriter. . A VOICE-OPERATED TELETYPEWRITER In this application of technology an ASR device converts to text the speech of a hearing person in a telephone-to-teletypewriter (TTY) connection. 'The text is then transmitted to the TTY of the deaf person who, in turn, can respond by means of either text or speech. If text, a text-to-speech synthesizer converts it to speech. The voice-operated teletypewriter (VO-TTY) allows a deaf and a hearing person to communicate with each other over the telephone; each party in the conversation uses the mode of communication (speech or text) that is most convenient for them. Figure 11.1. shows a diagram of how a telephone connection works with a YO-TTY. Note that the equipment required consists of a conventional telephone for the voice-user, a conventional TTY for the deaf text-user, and a computer containing an ASR device and a text-to-speech synthesizer at some point in the telephone circuit. Three possible locations for the computer are (1) at the position of the voice-user, (2) at the position of the text-user, and (3) at a remote location, such as the telephone exchange. The most effective, but unfortunately most expensive, mode of operation is to place the computer with the voice-user. In this wa)T, the hearing person can see the textual representation of his or her speech before it is transmitted and can correct any errors prior to transmission. This arrangement is eX]Jensive because each voice-user must have a computer with ASR software. Although the use and ownership of personal computers is growing rapidly it [18.118.227.69] Project MUSE (2024-04-25 12:42 GMT) Automatic Speech Recognition 135 will still be some time before personal computers are as common as telephones . The placement of the ASR computer system at the location of the TTYuser is a less expensive approach, but it places the financial burden on the TTY-user. The least expensive approach for both the voice-user and TTY-user is for the computer to be located at the telephone exchange, or some other remote location accessible by telephone. In this arrangement, neither the voice-user or TTY-user need purchase any special equipment and the computer at the remote location can be time-shared by a number of users. This approach limits...

Share