Computer with Microphone Icon  Speaking to Write: Realizing the Potential of Speech Recognition for Secondary Students with Disabilities
 Blank gifAbout Us  blank gifNew CDBlank gifDiscussionBlank gifSpotlight on Speech RecognitionBlank gifBlank gifLinks


The Resource Laboratory

Understanding the Use of
Continuous Speech Recognition Software for Writing

Which products:
The primary continuous speech recognition products are various versions of Dragon NaturallySpeaking, IBM ViaVoice, and L&H Voice Express. The discussion below is general to all of them, although some specifics differ from product to
product. Please consider any product carefully before purchasing if you are looking for a specific feature.

What it does:
Continuous speech recognition (CSR) software permits two basic functions:

  1. writing by voice input; and
  2. limited control of the Windows environment and application functions by voice input
    (this feature varies considerably from product to product and version to version).

In order to write with CSR, one dictates into a word processor. CSR products often come with their own word processing environments, but they will often be used with other applications. Some CSR products can also access the menus in an application, open and close applications, and even move the mouse through voice commands to operate within applications or in Windows itself.

How it works:
CSR software attempts to match the user's oral language to it's own built-in model of oral language, based on three sources of information:

  1. Acoustic - what a "typical" voice sounds like;
  2. Linguistic - how a "typical" person puts words together; and
  3. Lexical - a "typical" English vocabulary.

All CSR products require that the individual user undergo training that helps the software match the acoustic characteristics of user's voice with the program's acoustic model. This training of the acoustic model is the most critical element (see below) in beginning to use CSR. All CSR products also allow the user to add new vocabulary to personalize the lexical model, and some also incorporate routines for customizing the linguistic model to better match the user's particular writing style.

When the user speaks a string of words, the CSR software analyzes the acoustic patterns of the utterance, matches those acoustic patterns to single words in the vocabulary and to patterns of word use in the language model, in order to generate some text. Thus, the utterance, "We went on a field trip..." would be analyzed not only for the sounds of individual words, but also for the fact that "We went." may be a common construction at the beginning of a sentence, and that "field trip" is a known pattern for this user, whereas the acoustically similar "feel drip" is not.

During use, CSR has available in RAM an active vocabulary of thousands of words (and commands); this means that they are readily called up when spoken. Initially, the words in the active vocabulary are comprised of those that are the most common words in general usage English. This active vocabulary is backed up by a large vocabulary of about 200,000 words which contains lower incidence words and many names from geography, history, etc. When a word is called up from the back-up dictionary, it becomes part of the active vocabulary. New words that are not in the back-up dictionary can be added to the active vocabulary directly, and they also become readily available for use.

How to begin:
Training is critical for successful use of CSR. The training involves three components: (1) customizing the acoustic, linguistic, and lexical models for the individuals user; (2) learning how to "speak to" to computer to optimize recognition; and (3) learning the critical operational procedures of the software. To accomplish the first, all CSR products require an initial enrollment to teach the software how to interpret the user's voice (i.e., build an acoustic voice file). Modifications to the language model and vocabulary are optional, but often recommended.

After the initial enrollment of the voice file, one can use the software immediately to accomplish actual work, but this must be carefully monitored over the first few hours for misrecognized words or other problems in software operation. The voice file continues to get stronger with use only if the user makes corrections for misrecognized words and utterances. When the program makes errors in recognition, one should make an effort to understand what the error was and how to dictate the word(s) differently to achieve better results.

While doing this, the user can also be learning the operation of the software: how to make corrections, how to move the cursor around by voice, and generally how to get the software to do what you want it to. The introduction of these commands should be planned to reflect which are typically most commonly needed and to best address the user's specific needs. As with most training, the supervision and support are more intensive initially and become less vigilant as the user gains greater fluency and becomes more independent in use of the tool.

Writing by voice:
The thing to remember about writing by voice input is simply that it is writing using another method of text creation, an alternative to the pencil and the keyboard. IT IS NOT TALKING TO THE COMPUTER. This is hard to remember when one is using CSR, which seems so much like talking, but the student must develop a different image of creating text by dictation. When one
speaks and mumbles, elides words, or "swallows words" in conversation, the listener, as a capable language listener, can often "fill in the blanks." CSR cannot do that, and it has to hear every word spoken and enunciated. This model must be firmly established in the student's mind.

In order to make the most effective use of CSR (i.e., its various sources of information), dictation should be done in multiple-word "chunks," rather than in a word-by-word manner - for example, the text "It was a dark and stormy night." should be spoken together or in two parts rather than: "It, was, a, dark, and, stormy, night". This often requires that the user formulate and even rehearse some or all of the next sentence before beginning to dictate it.

Potential problems. CSR software has language, cognitive, and affective requirements for successful use, including the ability to: "put thoughts into words" before beginning to dictate; speak more than one word at a time when dictating; enunciate words clearly and consistently within the stream of continuous speech; reflect on and modify performance while dictating; and persevere in the face of training demands and possible problems achieving success. Difficulties in any of these areas could make CSR untenable for an individual, but success with CSR seems to vary significantly between individuals with similar characteristics, so a carefully planned trial use of CSR may always be in order.



Copyright 1999. Education Development Center, Inc. (EDC). This material was produced through a collaboration between EDC and Boston Children's Hospital. This document was downloaded from the Speaking to Write Web site at: http://www.edc.org/spk2wrt.

Select another Resource from the Lab


This Web site is funded through the U.S. Department of Education, National Institute on Disability and Rehabilitation Research. Contract #HI33G70143. The views expressed within this site do not necessarily reflect the views of the Government. Site hosted by Education Development Center, Inc. ©2000 Education Development Center, Inc. All Rights Reserved.

Should you have comments or questions about this Web site, please contact: spk2wrt@edc.org