|
Spotlight
on Speech Recognition: HTML Version without frames
View the document
by individual questions
Last updated: November 18,
1999.
THE FUNDAMENTALS
OF SPEECH RECOGNITION
1.
What is speech recognition technology?
Speech recognition
(also referred to as voice recognition) is a computer application
that lets people control a computer by speaking to it. In other
words, rather than using a keyboard and mouse to communicate
with the computer, the user speaks commands into a microphone
(on a headset, or mounted on a lapel pin, the desktop, etc.)
that is connected to a computer.
By speaking into
the microphone, users can do two things. First, they can tell
their computers to execute some commands such as open a document,
save changes, delete a paragraph, even move the cursor
all without touching a key. Second, users can write using speech
recognition in conjunction with a standard word processing program.
When users speak into the microphone their words can appear on
a computer screen in a word processing format, ready for revision
and editing. The discussion that follows will focus primarily
on using speech recognition for writing.
2.
How does speech recognition work?
There are two
kinds of speech recognition software now available: discrete
speech and continuous speech. The older technology, discrete
speech recognition, requires the user to speak one - word - at
- a - time. A newer technology, continuous speech recognition,
allows the user to dictate by speaking (at a more or less normal
rate). Both have their advantages and disadvantages for individuals
with difficulties in writing, which we will discuss in more detail
later.
As the user speaks,
the software puts one or more words on the screen by matching
the sound input with the information it has in the user's voice
file. Both kinds of speech recognition store frequently used
words and related information in the computer's memory (RAM)
for immediate use in guessing a word or string of words
this is called the active dictionary. When new vocabulary
is added, it enters the active dictionary. For less common vocabulary,
all speech recognition products have a large back-up dictionary
stored on the hard drive, so that it is relatively rare that
one would use a word that is entirely unknown to the software.
As the user trains
and speaks to the system, the software creates a user-specific
voice file that contains a lot of information about user voice
qualities and pronunciations, and with continuous speech recognition,
patterns of word usage. Both types of speech recognition software
also capture the user's preferred vocabulary. The voice file
in discrete speech recognition software is built primarily on
the user's pronunciation of individual words. The voice file
in continuous speech recognition also contains information about
the user's grammar and word usage (i.e., which words/phrases
tend to be used in what order). The software uses this acoustic
and linguistic information to make its best guess at each word
or phrase as it is dictated.
Some continuous
speech products also offer one additional tool for improving
the voice model: the software can analyze documents that the
user has created previously for the vocabulary and language/grammar
used, and incorporate this into its prediction routines. This
is a very powerful tool in terms of increasing accuracy when
dictating about specific subjects, but it may not always help
students if they are writing about widely divergent topics, or
writing in different styles.
The process of
creating a strong voice file i.e., "familiarizing"
the speech recognition software with an individual voice and
language pattern takes time. When a user takes the time
to properly train and use the speech recognition system, which
creates a strong and accurate voice file, the system will supply
the correct word or phrase most of the time. However, the system
will never achieve a 100% accuracy rate in all situations. Sometimes
the software just doesn't get it right and suggests the wrong
word. The user must then stop and make a correction.
3.
What happens when the computer does not recognize a dictated
word or phrase correctly?
Correcting the
software's incorrect guesses is an important way to improve the
performance of speech recognition software that is, to
recognize a user's voice better. Discrete and continuous speech
recognition systems operate a little differently, but the principle
is the same. When the software guesses an incorrect word, the
user indicates this to the software, and the program may generate
a list of alternative words that appears in a separate window.
The user can correct a mistake by choosing the desired word from
this list if it appears there, or by spelling the correct word
by the keyboard or by voice. Some programs use each letter when
spelling the correct to change the list of suggested words, so
that the user does not need to spell the entire word. This feature
makes the program operate like word prediction.
When using discrete
speech recognition software, the connection between the user's
dictated word and the program's guess is typically one-to-one,
so that identifying a mistake is relatively straightforward.
For example, the user says, "entire" and the program
produces, "attire."
However, with
continuous speech software, the correction can be a bit more
complicated. The software may misinterpret a single word as multiple
words, or vice versa, and often misinterprets more than one word
as part of a longer phrase. Actual examples: the user says "even
ungrammatical," and the program produces, "even on
grammatical," or the user says "in the habit,"
and the program produces, "inhabit." For corrections
like this, identifying the errors and deciding on how to make
the necessary changes can be more difficult, especially for students
with learning difficulties.
4.
What exactly constitutes a speech recognition system?
A speech recognition
system is made up of a multimedia computer with speech recognition
software, a microphone (which typically comes with the software),
and usually a sound card. To use speech recognition to write,
a word processing program or other text input software such as
email is also needed, although some systems have a built in word
processor.
Each kind of
speech recognition program has different hardware requirements.
- Discrete speech software (e.g.,
DragonDictate, v. 3) can operate on older machines, generally
a Pentium with 32 MB of RAM and a 16-bit soundcard.
- The newer continuous
speech recognition software generally requires a more powerful
computer, and the latest versions tend to operate best with the
high end machines available at the time of the software's release
(e.g., at the end of 1999, a Pentium III or alternative processor
operating at at least 350 MHz and with 128 MB RAM).
5.
How do speech recognition systems differ from one another?
Speech recognition
software varies in several ways. First, as noted above, there
are two general types of speech recognition: discrete and continuous.
There is currently (end of 1999) only one version of discrete
speech recognition for sale, DragonDictate for Windows, Classic
and it will not operate on operating systems beginning with
Windows 2000. It has an active vocabulary
There are new
versions of continuous speech recognition software coming out
every few months as the major companies (at this point, Dragon
Systems, IBM, Learnout & Hauspie, and Phillips)
vie for the market. Examination of their features can help determine
which product is most appropriate to any individual's needs.
Below is a partial list of features to consider if selecting
from among continuous speech recognition products:
- Ages for which
voice files were developed (this is particularly important for
younger users)
- Use of synthesized
speech read back of written work (this is particularly important
for users with learning disabilities)
- Methods of correcting
errors
- Length of training
required
- Degree of hands-free
use allowed (this is particularly important for users with physical
disabilities)
Continuous speech
recognition software packages also differ in the size of vocabularies
offered, but all products have at least 60 thousand words and
phrases, and are certainly adequate for most non-specialized
users.
6.
Aren't speech recognition systems prohibitively expensive?
When speech recognition
first appeared in the late 1980s, the basic software system cost
$9,000, not including the computer itself, which had to be a
relatively powerful, and therefore also costly. Fortunately,
the cost of speech recognition software has dropped dramatically
since continuous speech recognition has been introduced and has
become a general market item. Most packages designed for home
use sell for well under $100, and systems for professional users
are mostly under $200. The older discrete speech software that
cost almost $700 in early 1997 is now available for under $100.
Speech recognition
generally operates best with more powerful computers. Even so,
computer hardware prices have dropped to the point that one can
now run older speech recognition software on machines that cost
well under $1000. If one is using such a system successfully,
there would be no reason to upgrade the computer for years. However,
newer versions of speech recognition software always take full
advantage of advances in computer hardware, so that upgrading
the software might require also upgrading one's hardware to newer,
and more expensive, systems. It is critical to read the fine
print regarding system requirements and to carefully gauge the
benefit of newer software versions against the expense of upgrading
your hardware, building a new voice file, and so forth.
Perhaps the greatest
expense in using speech recognition software, especially in a
school, is the cost of implementation, particularly in terms
of providing teacher release time for training and subsequent
support. For the teacher who is expected to help students use
speech recognition in schools, training can run to 5 hours or
more for the first student, although this number obviously drops
with subsequent users.
7.
Which are the leading speech recognition systems on the market?
For Windows users,
there are four companies producing speech recognition software:
Dragon
Systems
(NaturallySpeaking, DragonDictate), IBM (ViaVoice), Learnout & Hauspie
(Voice
Express),
and Phillips (Free
Speech).
As of late 1999, three companies have announced Macintosh versions
of continuous speech recognition software, but none have been
released: Dragon Systems, IBM, and MacSpeech.
Select the product
names above to get more information about these systems and to
connect to the manufacturers' home pages. But don't get lost
out there on the Web. We have more work to do!
8.
How fast can a person "type" or input text using speech
recognition?
This varies according
to the type of speech recognition software one is using and of
course also varies greatly from user to user. Manufacturer claims
for conversational speed of text input should be taken with a
grain of salt, considering a variety of other factors. Of course,
discrete speech recognition is typically slower than continuous
speech recognition for most individuals.
Generally users
of speech recognition are not simply transcribing information
but are composing it. For such tasks, the real limiting
factor may be how quickly one can generate and formulate ideas.
In this sense, it is no different from an accomplished typist
who may be able to copy information quickly, but is slowed considerably
when having to compose original text.
Since many students
with disabilities have both (a) less experience with the writing
process, and (b) difficulties specifically in the language processes
involved in writing, the question of "how many words per
minute" is less important than the question, "Does
speech recognition provide a way for this student to produce
more text or perform more efficiently or independently than any
other input mode?" By most standards, the words per minute
rate might be low, but the comparison with other methods for
individual users might still argue for use of speech recognition.
For this reason, discrete speech recognition is not necessarily
always slower than continuous for students with disabilities.
9.
How does one determine whether to use discrete or continuous
speech recognition?
Discrete speech
recognition never "made a splash" in the broad marketplace
for a reason: it is too slow for most users. Typically, one would
only consider using discrete speech products now if there was
some extenuating circumstance that limits one's ability to use
continuous speech, such significant learning disabilities, motor
speech difficulties, and so forth. This will be discussed at
greater length in the next section. When presented with both
options, many students with learning disabilities prefer the
operation of discrete speech rather than continuous because they
find the pacing provided by discrete speech better matches their
text production style, at least at that point in their development.
On the other
hand, continuous speech recognition may be appropriate for anyone
who needs to write but feels that typing is problematic for some
reason and that they have reasonably good command of processes
in oral language production, such as generating and formulating
ideas, and speaking clearly.
10.
If a student uses speech recognition, how will they learn to
write?
Use of speech
recognition for writing is no different in principle from using
any other technology for writing, from the pencil to the word
processor. It is simply a way to get words onto paper. When speech
recognition is properly implemented, students understand that
they cannot simply talk to the computer, but that they are "writing
by voice." As one uses speech recognition, the style of
language gradually moves from an informal, conversational style
to one that is more formal and appropriate for writing, much
as the way that students must develop written language style
when writing by other means. Learning to write requires both
practice and instruction. The instruction students receive may
or may not be adequate to their needs, but use of speech recognition
offers those students who need it an unparalleled way to get
that practice.
POTENTIAL USERS
1. How
can speech recognition benefit students with physical disabilities?
Some students and adults have physical
disabilities that preclude their using a standard keyboard or
mouse effectively. For these students, speech recognition is
one of several alternative input methods to be explored. Speech
recognition may provide a more efficient means of controlling
a computer that is less physically and cognitively taxing than
other alternative input methods.
However, a student may seem to have the ability to use the keyboard,
but have subtler physical difficulties that make speech recognition
a more attractive option for them.
Take for example, Jason, a 19-year-old
young man who sustained a head injury at the age of 14 in a boating
accident. Jason suffered a significant impairment, known as "aphasia,"
in his production of oral language. This was characterized mostly
by great difficulty recalling words and formulating sentences.
In addition, he incurred a variety of other cognitive impairments,
as well as subtle physical difficulties, including a difficulty
with intentional movement called "apraxia," which limited
his ability to gain facility with the keyboard.
When we saw Jason he was 2 and 1/2 years
post-accident and making considerable progress in regaining language.
However, prior to his injury, he had a diagnosis of "dyslexia"
which had already affected his ability to read and write. Consequently,
he was overcoming the aphasia and apraxia, but also was still
suffering from dyslexia, all of which made written language production
very difficult for him.
Jason had already had an assistive technology
consult elsewhere and had been using word prediction, but with
little apparent success or interest. We explored it again, looking
at different and newer programs, but found that he frequently
lost his train of thought as he coped with the multiple demands
of formulating and remembering a sentence, locating the desired
key on the keyboard, beginning to spell individual words, locating
them in a list, looking back and forth from the keyboard to the
monitor, etc.
We then presented speech recognition
with synthetic speech readback of the text Jason had created.
In the very supported examination environment, it worked very
well for him; he could keep his attention focused in one place
for much of the time, the preferred word choice was usually given
first, and so forth.
Based on our recommendation, Jason's
parents and school district collaborated to purchase a speech
recognition system on a notebook computer for him so that he
could work at home and at school. At school he did his dictation
with his tutor/aide in a resource room, where it was relatively
quiet compared to many other environments in the school. The
school also placed another speech recognition system in Jason's
classroom for other students to get trained on and to use for
some writing. As of this year, there are up to five students
in the school who are beginning to use speech recognition for
writing. Jason graduated last year and has gone on to an art
college in another state, where he continues to be a successful
and increasingly more independent speech recognition user.
Meet Jason
2. How
can speech recognition benefit students with learning disabilities?
Speech recognition technology can benefit
students who have learning disabilities that interfere with their
ability to spell and write. While many such students benefit
from standard word processing, the visual-motor demands of keyboarding
can be a major stumbling block that compounds the writing difficulties.
Similarly students who are the poorest spellers are frequently
unable to effectively use standard spell checkers. For whatever
reason, if students' oral language skills far outstrip their
ability to generate text with pencil and paper or standard word
processing, speech recognition may enable them to become accomplished
writers by circumventing the most frustrating aspects of text
generation.
Take, for example Sara, a 15-year-old
sophomore in high school. Sara is a very bright young woman with
a learning disability in the area of written language. Like many
students with written output difficulties, Sara has the "gift
of gab," and readily provides vivid oral descriptions and
explanations. Unlike many such students, Sara loves to read and
has always been reasonably successful at it. Writing has been
a different story for Sara. Her spelling is idiosyncratic at
best, and her handwriting is very labored and difficult to read.
I first saw Sara as a fifth grader,
after her parents had already purchased a computer for her in
hopes that it would address her writing difficulties. The purpose
of this visit was to address issues about using the computer
in school. However, we quickly discovered that Sara was still
struggling. As bad as her handwriting was, it was still faster
than her ability to use the keyboard, and she did not have the
patience to plod along in her "hunt and peck mode."
Despite several months of keyboarding instruction in a computer
lab at school, Sara was still struggling with learning key locations.
The computer provided little support
in spelling as well. Her attempted spellings were so discrepant
from the correct form that they foiled regular spellcheckers.
Despite recommendations for training and support, by the end
of fifth grade, Sara was not progressing in using the computer,
and was getting ever more discouraged about school. Her preferred
mode was to write as little as she could, and if possible, not
at all.
We decided to launch a series of trial
sessions with speech recognition over the summer. Within two
sessions, Sara had begun to tell a yarn that would eventually
spin out over the summer to a 10 page neighborhood epic. She
was very enthusiastic and felt she had found the answer to her
problems. Unfortunately, speech recognition systems at that point
cost thousands of dollars and required a different computer than
Sara had access to at home or at school. Two years passed as
Sara became more discouraged about school and recommendations
for the system fell on deaf ears at the school department. Eventually
Sara's parents were able to secure a system for to use at home.
Sara learned quickly and once again her natural writing talents
came to the fore. At the end of that year, Sara was one of two
school-wide recipients of a coveted creative writing award.
Meet Sara
3. If learning
disabled students use speech recognition for writing, are they
still able to use other methods?
Certainly. Usually, learning disabled
students who use speech recognition are only able to do so in
certain circumstances and therefore must use other methods of
writing at other times. However, they often come to view speech
recognition as their text-entry method of choice whenever they
have a chance. Moreover, for some students, using speech recognition
for writing enables them to regain confidence in themselves as
writers, and in turn to persevere with other writing methods.
Ben, a 17-year-old boy, is a clear example
of this phenomenon. He is a very bright youngster who came to
our program during fifth grade for an assistive technology consult
because of his increasing frustration with difficulties in getting
his ideas down in writing. He had used the computer and word
processors for a couple of years, and he still was not working
at a pace that satisfied him. His parents, who brought him to
the session, were very concerned about his frustration and perception
of himself as unsuccessful and even incapable. They told me later
that Ben, who had always loved school, had grown to dread this
daily, negative experience because it reinforced his image of
himself as a poor writer. Even at that young age, Ben had even
expressed a desire to quit school.
We looked at a number of "lower-tech"
options, but none of these worked for Ben. He simply could not
manage to write efficiently enough, even with the benefit of
word prediction. However, when he first tried speech recognition,
it was like watching a light go on over his head. He, and his
parents, were immediately very excited by the potential they
saw in this system, and they went about obtaining a system for
Ben on their own that he could use at home.
Ben used speech recognition throughout
sixth and most of seventh grade. At the end of seventh grade,
two critical events occurred: he got a terrible head cold and
his voice changed! During this period, the speech recognition
software had great difficulty understanding his changed voice
and Ben found himself typing lots of corrections for the software.
Despite his frustration, Ben learned to type in the process of
correcting the software. In fact, during that time, Ben became
so proficient with the keyboard that he dropped speech recognition
all together.
It has now been more than two years
since Ben stopped using speech recognition, and he has successfully
maintained his transition back to typing. He now attends an academically
challenging high school in the Boston area and is doing very
well. In his own estimation, Ben thinks he is an "average"
writer among his peers.
Despite the fact that Ben's family bought
the speech recognition system when it was still fairly expensive,
they think it was money very well spent. His father said that
using speech recognition "saved Ben's life" in the
sense that it kept him from giving up in school.
Meet Ben
4. Is speech
recognition appropriate for all students with writing difficulties?
No. Speech recognition IS a promising
technology, but like all other technological solutions, it is
not necessarily appropriate for every student who experiences
difficulty with writing. Speech recognition should be part of
an overall consideration of assistive technology plan designed
to address the student's needs.
Using speech recognition requires several
different kinds of learning and performance which fall into roughly
three categories: (1) remembering and using the program's commands
and features; (2) composing ideas in one's head and saying them
aloud; and (3) speaking in a way that the software can understand
(which includes being able to alter that when the software does
not). In determining whether speech recognition is appropriate
for an individual that is, whether they can manage these
three areas one should consider several areas of student
ability:
Cognitively, students are asked to attend
to several tasks at the same time. For example, students must
be able to compose orally while operating the system through
oral commands. They must be able to tell which aspect of the
program is speech recognition and which is word processing. In
other words, students will most likely fair better if they are
somewhat flexible in their thinking and are able to juggle several
tasks at once. In addition, when the program makes an incorrect
guess about a word or phrase, the student must decide how to
correct the system and avoid the problem again.
Linguistically, students must eventually
understand the differences between written and spoken forms of
language so that they can adopt a more formalistic style of talking
for writing. They must be able to dictate and simultaneously
monitor both their written language and the software. Continuous
speech software places an extra burden on students, in that they
must be able to enunciate each word clearly while dictating in
a more continuous manner. This is more difficult for most people
because of the nature of the oral muscles involved in speaking
continuously, but it is particularly difficult for many students
with learning disabilities.
Academically, students must have sufficient
word reading skills to accurately read alternative word lists
and distinguish between visually similar words. They must be
able to detect when the system makes a mistake. And, they must
have sufficient phonetic spelling skills to prompt the system
to generate the correct word when it has made a mistake. Again,
continuous speech recognition places an extra burden on the student
because it is harder to monitor one's performance when the words
appear in longer segments (i.e., in entire phrases, fragments,
or whole sentences).
Behaviorally, students must be motivated
to learn the system and improve their writing skills. They must
persevere through training and accept that they use a methodology
different from the one most of their peers use. If students bring
a positive attitude to the process, they can help themselves
a great deal.
Please visit the Resource
Laboratory at the top page of the Speaking
to Write website to find a document which describes in greater
detail the user characteristics to take into account when considering
the use of speech recognition.
5. Do students
need to have all the skills you mentioned in the preceding question
to be able to use speech recognition?
Not necessarily. A wide spectrum of
individuals can use speech recognition if enough external support
is provided.
Most individuals with relative strength
in the skill areas mentioned will likely be able to write independently
after being properly trained in the use of speech recognition.
However, if a student is weak in one or two of these areas, he
or she may never become completely independent in using speech
recognition, or may become proficient with speech recognition
only with more intensive training. In either case, more intensive
instructional support may be needed, particularly in the early
stages of training and use, but possibly in all situations. Nonetheless,
even without complete independence, the improvement in writing
outcomes and self-esteem for the student may outweigh the drawbacks,
and these decisions would be warranted on an individual basis.
The amount of support, or "scaffolding,"
needed by an individual student to use speech recognition might
vary not only by individual abilities, but also by the kind of
task involved. If the task is straightforward, less demanding
cognitively, and possibly somewhat shorter (e.g., writing single
sentences to define vocabulary words, writing in a journal),
then less scaffolding may be needed. Longer tasks involving greater
complexity or more difficult content would require more scaffolding.
In this way, writing by voice is no different from doing so by
any other means.
6. How
can one best determine whether or not an individual student can
use speech recognition?
To rule speech recognition in or out,
the student must have the opportunity to try speech recognition,
perhaps over several sessions. If a school has purchased a system
for multiple users, appropriate students can experiment with
the approach in this setting. Alternatively, this exploration
can be done with the help of an assistive technology evaluation
team in a clinical setting or a person who routinely trains users.
In either case, be cautious when working with trainers who also
sell the software, because their assessment of the student's
potential may be colored by their desire to sell the product.
7. Can
students with speech impairments use speech recognition?
Some students with physical disabilities
may also have labored or inconsistent speech. Some students with
learning disabilities may have more subtle articulation difficulties,
which sometimes only appear in particular contexts. Even though
such speech impairments may complicate the picture, they do not
necessarily preclude the student's using speech recognition,
although they may limit the user to discrete speech recognition
systems, which are more readily adapted to impaired speech.
One thing that has been true in the
past and seems to remain true to this point (late 1999) is that
discrete speech recognition tends to be more forgiving than continuous
of speech difficulties, and even variations in speech such as
accents of second language speakers or even accents of English-speakers
that differ greatly from a "typical" middle American
accent. The reason that many people with speech difficulties
have trouble using continuous speech is that control over articulation
of individual words becomes harder when they are imbedded in
speech that is, with words before and after. Therefore,
the demands of continuous speech make the clear enunciation of
single words harder. Students with speech difficulties will need
to spend more time training the software to recognize their voices
than students without such impairments.
The problem with recognition of speech
with accents in continuous speech may be rectified for some by
completing more training in the software, but for others it simply
does not work. While one might point to the "intensity"
of the accent, but there is no clear evidence on this.
Having said this, we have personal knowledge
of at least one student with significant writing disabilities
and mild-moderate articulation problems who likes to write. He
is so excited about using continuous speech recognition to get
his ideas out that he tolerates a relatively high rate of recognition
errors (e.g., 20%, or one word out of every five, wrong), and
is willing to spend the time afterwards to make the corrections.
8. Because educational research on the use of speech
recognition technology is in its infancy, very few studies exist
to date on the possible benefits of this system for students
with disabilities. One promising study (Higgins & Zvi, 1995)
at California State University at Northridge explored the performance
of learning disabled college students using speech recognition
technology to complete the university's written proficiency exam.
With the use of this innovation, the learning disabled students
achieved the same distribution of scores on the exam as their
non disabled peers. With a human transcriber's assistance or
with no assistance at all, these same learning disabled students'
score distribution fell below that of their non disabled peers.
Another exploratory study (Wetzel, 1996)
focused on a single subject-a sixth grade student with learning
disabilities. Wetzel was interested in whether middle school
students could learn to use a speech recognition system, in this
case IBM VoiceType, and whether this system would enhance their
communication skills. Wetzel found that the student was able
to learn to use the software, but that difficulties with the
system's recognition accuracy and the complexity of editing compromised
this student's success. This early research points to some of
the difficulties in using this technology with students who have
disabilities as well as to the potential benefits. For example,
because the technology was developed with adult voice models,
the software is not as proficient at recognizing the speech of
prepubescent youth. The research also suggests that younger students
may struggle to a greater degree with the cognitive demands of
composing orally while also giving the computer oral directions.
Anecdotal evidence through the Speaking
to Write project and listserv, and reports at educational conferences
by various practitioners are very promising in terms of the positive
effects of speech recognition for students who need it. However,
it is clear that implementation is often not easy. Every successful
implementation of which we are aware can be contrasted with many
unsuccessful stories, and while the source of difficulty(s) can
often be identified in these unsuccessful cases, it can not always
be addressed (e.g., having a capable trainer available locally).
References:
Higgins, E.L., & Zvi, J.C. (1995). Assistive technology for
postsecondary students with learning disabilities: From research
to practice. Annals of Dyslexia, 45: 123-143.
Wetzel, K. (1996). Speech-recognizing computers: A written-communication
tool for students with learning disabilities? Journal of Learning
Disabilities, 29(4): 371-380.
TRAINING
1. How important
is training for the user?
Proper training is critical. A solid
training foundation is the key to on-going success with speech
recognition for all users regardless of skill or age. There are
in fact four aspects of training with speech recognition: developing
an individual voice/language file in the software; learning to
use the software itself; developing a dictation "style";
and becoming a better writer.
First, the speech recognition system
itself must be properly trained to recognize the user's voice.
This is done through an initial training process called "enrollment,"
and through proper use of the program subsequently. Rather like
a speaker and listener who both know the same language, but have
widely differing accents, the software tries to accustom itself
to the user's voice. This is so that the software can understand
every word the user says, even when it is a word that he or she
has never said to the software before. However, this does not
mean that the user has to say every word before it can be understood
by the system, because a well-trained voice file can understand
many new words as well.
As we discussed earlier, the software
gets accustomed to the user's voice by building an individual
model that is established through enrollment and then modified
subsequent use. This model helps the software decide what word
or phrase to predict and display from the active dictionary with
every subsequent user utterance. The better the model, the better
the prediction, and if the software is used correctly, prediction
improves with increased usage. Therefore, trainers should help
students gain a general understanding of how the speech recognition
software works, so that they understand the importance of proper
usage.
Discrete and continuous speech recognition
products differ somewhat in their initial training routines:
Discrete speech recognition training involves less initial training
and more use of correction strategies during early usage to achieve
an accurate voice file.
Continuous speech recognition training is embedded in the longer
initial enrollment of the voice required by the software (although
recent versions of speech recognition software have greatly reduced
this time period), so that somewhat greater accuracy is possible
after the initial training period. However, subsequent correction
strategies are still very important to building a strong voice
file.
Two, the student must be trained in
all aspects of the system that they need to know. All users,
and especially those who are younger, must be properly trained
in the process of dictating. Additionally, students must learn
how to correct any mismatches between the user's spoken word
and the software's predictions. Beyond this, some students may
also want to use other command features of speech recognition
software that allow for some use of programs without using the
keyboard and mouse, such as learning how to spell by voice, giving
voice commands to the computer, or even operating the mouse in
order to play their favorite game.
It is also critical that parents, teachers,
tutors, or aides who work most closely with the student when
he or she is writing attend and observe some of the initial training.
If they (the professionals) have an opportunity to learn to use
the system themselves, this can help them gain some insight into
the student's needs during use. However, attendance at some of
the training is a minimum requirement.
Third, the student must learn how to
write through speaking. Students are learning a new text input
mode (just like keyboarding) that uses the voice, but is not
like talking. This requires time. Developing this facility is
mostly a matter of practice just doing it. However, it
is most helpful to have some form of instructional support or
monitoring available, and even training, during early dictation
experiences in order to help with problem solving when recognition
errors arise that could be corrected by adopting a slightly different
dictation strategy. This latter point is especially important
with continuous speech recognition, where there often is not
a one-to-one correspondence between words dictated and words
predicted.
Fourth, and most important, as students
gain mastery in using the software, and dictating by voice, they
can begin the task of becoming better writers. Students who have
struggled with writing do not automatically become accomplished
writers with speech recognition, any more than they can become
writers without instructional support. They will continue to
need help with such skills as idea generation, organization,
grammar, and vocabulary.
2. Is training
school-age students different from training adults?
It may not be, but it probably should
be. Current product tutorials and materials developed for speech
recognition are designed mostly for adults, to move them toward
independence with the system at a relatively fast clip and provide
few accommodations for individual differences. Younger students
can rapidly become overwhelmed during the training process unless
modifications are made. In fact, training goals and methods need
to be reconceptualized for students, and a slower, more incremental
approach is often more successful with this population.
When initiating speech recognition training
with students, trainers should consider building knowledge and
mastery of the four interrelated aspects of the task mentioned
above: building a strong voice/language file through enrollment
and correct usage; learning to operate the software; developing
a dictation "style" for writing; and learning to write.
Maximum support from a teacher (or trainer or other adult) is
necessary to address the first two aspects: building a strong
voice/language file and learning to operate the software. Students
must learn the most efficient ways to use the software for their
purposes while they also develop their voice files through correct
usage, particularly in terms of making corrections. Close supervision
of the process is needed during this phase, which could last
up to five hours of total use.
During this initial time, the student
will also be developing a dictation "style," but this
process will probably take somewhat longer, possibly through
many hours of use. Close supervision is not needed here, but
regular check-ins of diminishing frequency over time to ensure
that the student is still using the software to best advantage.
Learning to write is obviously a lifelong
process. However, training to be a writer should be essential
and on-going element. In terms of learning a new input technique
like speech recognition, it is important that teachers not assume
that the use of speech input serves as a substitute for involvement
in regular instruction for writing. The process of writing by
voice may be slightly different, as different kinds of errors
occur in earlier drafts, but the expected outcomes should eventually
be no different.
Click here to view an illustration that will help you visualize the various aspects
of training.
3. What
might a speech recognition training sequence for students look
like?
The process of teaching students to
use speech recognition must be individualized to their own learning
needs and style. Remember that we are talking about students
who may be skeptical of their own abilities and who may lack
experience in writing. However, experience and common sense about
teaching suggest that the process should usually adhere to some
variation of the following steps:
- The student observes the evaluator
or trainer using some of the basic functions of speech recognition
software; dictating in a proper mode, making corrections, including
selection of alternative word), spelling to generate additional
choices, and so forth.
- The student undergoes enough initial
training so that the software begins to create and individual
voice file. For discrete speech, this can occur immediately after
initial enrollment (maybe ten minutes). For continuous speech,
the initial enrollment and training is somewhat lengthier (up
to 30 minutes).
- The student is prompted to generate
a single, simple sentence (e.g., "I like to go snowboarding
and skateboarding." This is done with the speech recognition
system turned off. The student says the sentence aloud so that
the trainer knows what is going to be said.
- With a word processor appropriate to
the student's developmental level and interests, the student
begins dictation with the teacher attending to all other operational
matters, such as turning the microphone on and off, using the
keyboard for corrections, and making alternative selections.
The teacher completes all the corrections with the student's
input, and helps the student make any necessary adjustments to
the dictation style. The goal at this point is to have the student
dictate one sentence so that the system is familiar with the
words.
- The student dictates the same sentence
once or twice more, which allows him or her to experience a greater
level of fluency and try different dictation strategies.
- The student and trainer decide on a
second sentence which uses some of the same words (e.g., "Snowboarding
and skateboarding are popular sports.").
- The student dictates the new sentence
and the preceding steps are repeated with the student taking
gradual responsibility for operating the system.
- The student undergoes additional training
to facilitate accurate dictation.
4. Once
the voice file has been set up, how does the student learn to
operate the system?
As the student tries new sentences and
gradually assumes responsibility for an increasing number of
functions in the software, the trainer should carefully introduce
him or her to the latter sequence of steps in learning to operate
the system should. The following sequence is the one that we
use:
- dictation only
- dictation plus operation of the microphone
hotkey (to turn it on and off)
- dictation and mic operation plus identification
of errors in dictation and selection of words from the list of
alternatives
- dictation, mic operation, selection
plus spelling to train new words or elicit them from the background
dictionary
- dictation, mic operation, selection,
spelling plus alternative models of error correction
- dictation, mic operation, selection,
spelling, error correction, plus...(At this point, the sequence
can be customized even more to the individual student's needs.
For example, does he or she need to use voice to control the
mouse or access the menus?)
IMPLEMENTATION
1. Where should a system reside? At school, at
home, or both?
Once it has been determined that speech
recognition is a good fit for a particular student, the issue
of where to place the system arises. Ideally, the system should
be placed in the environment where it will be most effective
in meeting the student's writing goals.
Most secondary students that we know
use their speech recognition systems at home. This makes sense
for several reasons: students generally have larger blocks of
time to write at home than during school; the potential to find
a relatively quiet spot for dictation may be greater at home;
and students can work at home without concern over how they might
appear to others while dictating.
On the other hand, many students who
will benefit from using speech recognition are not experienced
writers; they may need a considerable amount of instructional
support while they compose. A tutor working with the student
at home can help remedy this problem to some extent, but may
rarely be accessible for the entire time during dictation.
For this reason use of speech recognition
in school is also an important option to consider. Jason used
a notebook computer and carried his speech recognition system
back and forth from home to school. At school he worked in the
resource room while writing or dictating, with his instructional
aide nearby to provide any assistance or guidance needed. (Note
that, based on the school's experience with Jason, five other
students now have access to speech recognition systems that they
also use in the resource room.)
2. What
about the noise factor at school?
Yes, excessive background noise can
be a problem. To use speech recognition in the school setting,
the student needs a relatively quiet, more or less private place
to work. Even though speech recognition is being used in office
environments and the software offers a number of internal settings
that help control for ambient noise, middle and secondary school
students can generate a lot of background noise; school rooms
also are often not constructed to minimize environmental noise.
Therefore, placement of the computer must be a consideration.
Sometimes speech recognition systems are placed in resource rooms,
but a corner of the library or a station in the computer lab
can also serve the purpose under the right circumstances (e.g.,
during a quieter, lower use period). One or two students that
we are aware of are actually using speech recognition in the
classroom.
3. Who
should provide ongoing training and technical assistance?
At least one person in the educational
or home setting should have a deeper technical knowledge of the
operations and requirements of the software so that he or she
can provide ongoing training and technical assistance to the
student without having to depend on outside trainers or consultants
indefinitely.
4. How
will students and teachers in the school setting react to a student
using speech recognition?
All teachers and service providers working
with the student should have a fundamental understanding of speech
recognition and what it does and doesn't provide the student.
This became a problem for Sara who used the system at home. When
Sara brought in papers that she wrote with speech recognition,
the teacher, who assumed that Sara's system would automatically
correct any mistakes she made, was surprised to see occasional
grammatical errors.
Peer acceptance of this technology is generally fairly high,
especially if the student is perceived as successful or capable
in other ways. However, it is rarely the perceptions of others
that matter, but the students' own perceptions of his or her
abilities and potential when using the technology. We have not
found any single approach that works in this regard, other than
general sensitivity to the issue and helping the student find
the location in which he or she feels most comfortable working.
It is also important to remember that
writing is sometimes a private act, where one is exposing one's
ideas to review by others. Some students may feel this way more
strongly than others. They should not be put in a position where
use of speech recognition makes them feel unnecessarily vulnerable.
5. What
about ongoing instructional support?
We have found that even if students
do not need help in producing their first drafts by speech recognition,
they usually require some additional support in editing and revising.
This is particularly the case with students who have an aversion
to writing and therefore have little experience with editing
and revision. As a consequence, these students have often missed
many of the incremental steps learned about writing in the earlier
grades.
Therefore, many students who are successfully
using speech recognition to create lengthy first drafts of texts
(often, for the first time) require help in knowing how to proceed
with these texts. Consequently, they should not be cut off from
individual instructional support simply because they are using
the speech recognition system.
6. What
are the implications for the teaching of writing and the curriculum?
Teachers may need assistance in thinking
through the implications of speech recognition technology for
the teaching of writing at the classroom level and for writing
as it is integrated throughout the curriculum. With careful planning,
speech recognition can be used to facilitate various stages of
the writing process, (i.e. brainstorming, outlining, drafting,
revising, editing, publishing). Speech recognition software generally
provides a means of segmenting the vocabulary so that the system
can be fine-tuned for specific writing assignments in various
subjects such as history or science.
spotlight
viewing options | spotlight home
This
Web site was funded from 1997-2001 by the U.S. Department of Education,
National
Institute on Disability and Rehabilitation Research (NIDRR). Contract #HI33G70143.
The views expressed within this site do not necessarily reflect the
views of the Government. Site hosted by Education Development Center,
Inc.
©2000
Education Development Center, Inc. All Rights Reserved.
Material
on this site is no longer updated. Final update 2/02.
|