|
Understanding the Use of
Continuous Speech Recognition Software for Writing
Which products:
The primary continuous speech recognition products are various
versions of Dragon NaturallySpeaking, IBM ViaVoice, and L&H
Voice Express. The discussion below is general to all of them,
although some specifics differ from product to
product. Please consider any product carefully before purchasing
if you are looking for a specific feature.
What it does:
Continuous speech recognition (CSR) software permits two basic
functions:
- writing by voice input; and
- limited control of the Windows environment and application
functions by voice input
(this feature varies considerably from product to product and
version to version).
In order to write with CSR, one dictates into a word processor.
CSR products often come with their own word processing environments,
but they will often be used with other applications. Some CSR
products can also access the menus in an application, open and
close applications, and even move the mouse through voice commands
to operate within applications or in Windows itself.
How it works:
CSR software attempts to match the user's oral language to it's
own built-in model of oral language, based on three sources of
information:
- Acoustic - what a "typical" voice sounds like;
- Linguistic - how a "typical" person puts words
together; and
- Lexical - a "typical" English vocabulary.
All CSR products require that the individual user undergo
training that helps the software match the acoustic characteristics
of user's voice with the program's acoustic model. This training
of the acoustic model is the most critical element (see below)
in beginning to use CSR. All CSR products also allow the user
to add new vocabulary to personalize the lexical model, and some
also incorporate routines for customizing the linguistic model
to better match the user's particular writing style.
When the user speaks a string of words, the CSR software analyzes
the acoustic patterns of the utterance, matches those acoustic
patterns to single words in the vocabulary and to patterns of
word use in the language model, in order to generate some text.
Thus, the utterance, "We went on a field trip..." would
be analyzed not only for the sounds of individual words, but
also for the fact that "We went." may be a common construction
at the beginning of a sentence, and that "field trip"
is a known pattern for this user, whereas the acoustically similar
"feel drip" is not.
During use, CSR has available in RAM an active vocabulary
of thousands of words (and commands); this means that they are
readily called up when spoken. Initially, the words in the active
vocabulary are comprised of those that are the most common words
in general usage English. This active vocabulary is backed up
by a large vocabulary of about 200,000 words which contains lower
incidence words and many names from geography, history, etc.
When a word is called up from the back-up dictionary, it becomes
part of the active vocabulary. New words that are not in the
back-up dictionary can be added to the active vocabulary directly,
and they also become readily available for use.
How to begin:
Training is critical for successful use of CSR. The training
involves three components: (1) customizing the acoustic, linguistic,
and lexical models for the individuals user; (2) learning how
to "speak to" to computer to optimize recognition;
and (3) learning the critical operational procedures of the software.
To accomplish the first, all CSR products require an initial
enrollment to teach the software how to interpret the user's
voice (i.e., build an acoustic voice file). Modifications to
the language model and vocabulary are optional, but often recommended.
After the initial enrollment of the voice file, one can use
the software immediately to accomplish actual work, but this
must be carefully monitored over the first few hours for misrecognized
words or other problems in software operation. The voice file
continues to get stronger with use only if the user makes corrections
for misrecognized words and utterances. When the program makes
errors in recognition, one should make an effort to understand
what the error was and how to dictate the word(s) differently
to achieve better results.
While doing this, the user can also be learning the operation
of the software: how to make corrections, how to move the cursor
around by voice, and generally how to get the software to do
what you want it to. The introduction of these commands should
be planned to reflect which are typically most commonly needed
and to best address the user's specific needs. As with most training,
the supervision and support are more intensive initially and
become less vigilant as the user gains greater fluency and becomes
more independent in use of the tool.
Writing by voice:
The thing to remember about writing by voice input is simply
that it is writing using another method of text creation, an
alternative to the pencil and the keyboard. IT IS NOT TALKING
TO THE COMPUTER. This is hard to remember when one is using CSR,
which seems so much like talking, but the student must develop
a different image of creating text by dictation. When one
speaks and mumbles, elides words, or "swallows words"
in conversation, the listener, as a capable language listener,
can often "fill in the blanks." CSR cannot do that,
and it has to hear every word spoken and enunciated. This model
must be firmly established in the student's mind.
In order to make the most effective use of CSR (i.e., its
various sources of information), dictation should be done in
multiple-word "chunks," rather than in a word-by-word
manner - for example, the text "It was a dark and stormy
night." should be spoken together or in two parts rather
than: "It, was, a, dark, and, stormy, night". This
often requires that the user formulate and even rehearse some
or all of the next sentence before beginning to dictate it.
Potential problems. CSR software has language, cognitive,
and affective requirements for successful use, including the
ability to: "put thoughts into words" before beginning
to dictate; speak more than one word at a time when dictating;
enunciate words clearly and consistently within the stream of
continuous speech; reflect on and modify performance while dictating;
and persevere in the face of training demands and possible problems
achieving success. Difficulties in any of these areas could make
CSR untenable for an individual, but success with CSR seems to
vary significantly between individuals with similar characteristics,
so a carefully planned trial use of CSR may always be in order.
Copyright 1999. Education Development Center, Inc. (EDC). This
material was produced through a collaboration between EDC and
Boston Children's Hospital. This document was downloaded from
the Speaking to Write Web site at: http://www.edc.org/spk2wrt.
Select another Resource from the Lab
This Web site is
funded through the U.S. Department of Education, National
Institute on Disability and Rehabilitation Research. Contract
#HI33G70143. The views expressed within this site do not necessarily
reflect the views of the Government. Site hosted by Education
Development Center, Inc. ©2000
Education Development Center, Inc. All Rights Reserved.
Should you have comments or questions about
this Web site, please contact: spk2wrt@edc.org
|