This document has been downloaded from the spk2wrt Web site at: http://www.edc.org/spk2wrt SPOTLIGHT ON SPEECH RECOGNITION: ASCII (text-only) version Last updated: November 18, 1999. THE FUNDAMENTALS OF SPEECH RECOGNITION 1. What is speech recognition technology? Speech recognition (also referred to as voice recognition) is a computer application that lets people control a computer by speaking to it. In other words, rather than using a keyboard and mouse to communicate with the computer, the user speaks commands into a microphone (on a headset, or mounted on a lapel pin, the desktop, etc.) that is connected to a computer. By speaking into the microphone, users can do two things. First, they can tell their computers to execute some commands such as open a document, save changes, delete a paragraph, even move the cursor - all without touching a key. Second, users can write using speech recognition in conjunction with a standard word processing program. When users speak into the microphone their words can appear on a computer screen in a word processing format, ready for revision and editing. The discussion that follows will focus primarily on using speech recognition for writing. 2. How does speech recognition work? There are two kinds of speech recognition software now available: discrete speech and continuous speech. The older technology, discrete speech recognition, requires the user to speak one - word - at - a - time. A newer technology, continuous speech recognition, allows the user to dictate by speaking (at a more or less normal rate). Both have their advantages and disadvantages for individuals with difficulties in writing, which we will discuss in more detail later. As the user speaks, the software puts one or more words on the screen by matching the sound input with the information it has in the user's voice file. Both kinds of speech recognition store frequently used words and related information in the computer's memory (RAM) for immediate use in guessing a word or string of words - this is called the active dictionary. When new vocabulary is added, it enters the active dictionary. For less common vocabulary, all speech recognition products have a large back-up dictionary stored on the hard drive, so that it is relatively rare that one would use a word that is entirely unknown to the software. As the user trains and speaks to the system, the software creates a user-specific voice file that contains a lot of information about user voice qualities and pronunciations, and with continuous speech recognition, patterns of word usage. Both types of speech recognition software also capture the user's preferred vocabulary. The voice file in discrete speech recognition software is built primarily on the user's pronunciation of individual words. The voice file in continuous speech recognition also contains information about the user's grammar and word usage (i.e., which words/phrases tend to be used in what order). The software uses this acoustic and linguistic information to make its best guess at each word or phrase as it is dictated. Some continuous speech products also offer one additional tool for improving the voice model: the software can analyze documents that the user has created previously for the vocabulary and language/grammar used, and incorporate this into its prediction routines. This is a very powerful tool in terms of increasing accuracy when dictating about specific subjects, but it may not always help students if they are writing about widely divergent topics, or writing in different styles. The process of creating a strong voice file - i.e., "familiarizing" the speech recognition software with an individual voice and language pattern - takes time. When a user takes the time to properly train and use the speech recognition system, which creates a strong and accurate voice file, the system will supply the correct word or phrase most of the time. However, the system will never achieve a 100% accuracy rate in all situations. Sometimes the software just doesn't get it right and suggests the wrong word. The user must then stop and make a correction. 3. What happens when the computer does not recognize a dictated word or phrase correctly? Correcting the software's incorrect guesses is an important way to improve the performance of speech recognition software - that is, to recognize a user's voice better. Discrete and continuous speech recognition systems operate a little differently, but the principle is the same. When the software guesses an incorrect word, the user indicates this to the software, and the program may generate a list of alternative words that appears in a separate window. The user can correct a mistake by choosing the desired word from this list if it appears there, or by spelling the correct word by the keyboard or by voice. Some programs use each letter when spelling the correct to change the list of suggested words, so that the user does not need to spell the entire word. This feature makes the program operate like word prediction. When using discrete speech recognition software, the connection between the user's dictated word and the program's guess is typically one-to-one, so that identifying a mistake is relatively straightforward. For example, the user says, "entire" and the program produces, "attire." However, with continuous speech software, the correction can be a bit more complicated. The software may misinterpret a single word as multiple words, or vice versa, and often misinterprets more than one word as part of a longer phrase. Actual examples: the user says "even ungrammatical," and the program produces, "even on grammatical," or the user says "in the habit," and the program produces, "inhabit." For corrections like this, identifying the errors and deciding on how to make the necessary changes can be more difficult, especially for students with learning difficulties. 4. What exactly constitutes a speech recognition system? A speech recognition system is made up of a multimedia computer with speech recognition software, a microphone (which typically comes with the software), and usually a sound card. To use speech recognition to write, a word processing program or other text input software such as email is also needed, although some systems have a built in word processor. Each kind of speech recognition program has different hardware requirements. - Discrete speech software (e.g., DragonDictate, v. 3) can operate on older machines, generally a Pentium with 32 MB of RAM and a 16- bit soundcard. - The newer continuous speech recognition software generally requires a more powerful computer, and the latest versions tend to operate best with the high end machines available at the time of the software's release (e.g., at the end of 1999, a Pentium III or alternative processor operating at at least 350 MHz and with 128 MB RAM). 5. How do speech recognition systems differ from one another? Speech recognition software varies in several ways. First, as noted above, there are two general types of speech recognition: discrete and continuous. There is currently (end of 1999) only one version of discrete speech recognition for sale, DragonDictate for Windows, Classic and it will not operate on operating systems beginning with Windows 2000. It has an active vocabulary There are new versions of continuous speech recognition software coming out every few months as the major companies (at this point, Dragon Systems, IBM, Learnout & Hauspie, and Phillips) vie for the market. Examination of their features can help determine which product is most appropriate to any individual's needs. Below is a partial list of features to consider if selecting from among continuous speech recognition products: - Ages for which voice files were developed (this is particularly important for younger users) - Use of synthesized speech read back of written work (this is particularly important for users with learning disabilities) - Methods of correcting errors - Length of training required - Degree of hands-free use allowed (this is particularly important for users with physical disabilities) Continuous speech recognition software packages also differ in the size of vocabularies offered, but all products have at least 60 thousand words and phrases, and are certainly adequate for most non-specialized users. 6. Aren't speech recognition systems prohibitively expensive? When speech recognition first appeared in the late 1980s, the basic software system cost $9,000, not including the computer itself, which had to be a relatively powerful, and therefore also costly. Fortunately, the cost of speech recognition software has dropped dramatically since continuous speech recognition has been introduced and has become a general market item. Most packages designed for home use sell for well under $100, and systems for professional users are mostly under $200. The older discrete speech software that cost almost $700 in early 1997 is now available for under $100. Speech recognition generally operates best with more powerful computers. Even so, computer hardware prices have dropped to the point that one can now run older speech recognition software on machines that cost well under $1000. If one is using such a system successfully, there would be no reason to upgrade the computer for years. However, newer versions of speech recognition software always take full advantage of advances in computer hardware, so that upgrading the software might require also upgrading one's hardware to newer, and more expensive, systems. It is critical to read the fine print regarding system requirements and to carefully gauge the benefit of newer software versions against the expense of upgrading your hardware, building a new voice file, and so forth. Perhaps the greatest expense in using speech recognition software, especially in a school, is the cost of implementation, particularly in terms of providing teacher release time for training and subsequent support. For the teacher who is expected to help students use speech recognition in schools, training can run to 5 hours or more for the first student, although this number obviously drops with subsequent users. 7. Which are the leading speech recognition systems on the market? For Windows users, there are four companies producing speech recognition software: Dragon Systems (NaturallySpeaking, DragonDictate), IBM (ViaVoice), Learnout & Hauspie (Voice Express), and Phillips (Free Speech). As of late 1999, three companies have announced Macintosh versions of continuous speech recognition software, but none have been released: Dragon Systems, IBM, and MacSpeech. 8. How fast can a person "type" or input text using speech recognition? This varies according to the type of speech recognition software one is using and of course also varies greatly from user to user. Manufacturer claims for conversational speed of text input should be taken with a grain of salt, considering a variety of other factors. Of course, discrete speech recognition is typically slower than continuous speech recognition for most individuals. Generally users of speech recognition are not simply transcribing information but are composing it. For such tasks, the real limiting factor may be how quickly one can generate and formulate ideas. In this sense, it is no different from an accomplished typist who may be able to copy information quickly, but is slowed considerably when having to compose original text. Since many students with disabilities have both (a) less experience with the writing process, and (b) difficulties specifically in the language processes involved in writing, the question of "how many words per minute" is less important than the question, "Does speech recognition provide a way for this student to produce more text or perform more efficiently or independently than any other input mode?" By most standards, the words per minute rate might be low, but the comparison with other methods for individual users might still argue for use of speech recognition. For this reason, discrete speech recognition is not necessarily always slower than continuous for students with disabilities. 9. How does one determine whether to use discrete or continuous speech recognition? Discrete speech recognition never "made a splash" in the broad marketplace for a reason: it is too slow for most users. Typically, one would only consider using discrete speech products now if there was some extenuating circumstance that limits one's ability to use continuous speech, such significant learning disabilities, motor speech difficulties, and so forth. This will be discussed at greater length in the next section. When presented with both options, many students with learning disabilities prefer the operation of discrete speech rather than continuous because they find the pacing provided by discrete speech better matches their text production style, at least at that point in their development. On the other hand, continuous speech recognition may be appropriate for anyone who needs to write but feels that typing is problematic for some reason and that they have reasonably good command of processes in oral language production, such as generating and formulating ideas, and speaking clearly. 10. If a student uses speech recognition, how will they learn to write? Use of speech recognition for writing is no different in principle from using any other technology for writing, from the pencil to the word processor. It is simply a way to get words onto paper. When speech recognition is properly implemented, students understand that they cannot simply talk to the computer, but that they are "writing by voice." As one uses speech recognition, the style of language gradually moves from an informal, conversational style to one that is more formal and appropriate for writing, much as the way that students must develop written language style when writing by other means. Learning to write requires both practice and instruction. The instruction students receive may or may not be adequate to their needs, but use of speech recognition offers those students who need it an unparalleled way to get that practice. POTENTIAL USERS 1. How can speech recognition benefit students with physical disabilities? Some students and adults have physical disabilities that preclude their using a standard keyboard or mouse effectively. For these students, speech recognition is one of several alternative input methods to be explored. Speech recognition may provide a more efficient means of controlling a computer that is less physically and cognitively taxing than other alternative input methods. However, a student may seem to have the ability to use the keyboard, but have subtler physical difficulties that make speech recognition a more attractive option for them. Take for example, Jason, a 19-year-old young man who sustained a head injury at the age of 14 in a boating accident. Jason suffered a significant impairment, known as "aphasia," in his production of oral language. This was characterized mostly by great difficulty recalling words and formulating sentences. In addition, he incurred a variety of other cognitive impairments, as well as subtle physical difficulties, including a difficulty with intentional movement called "apraxia," which limited his ability to gain facility with the keyboard. When we saw Jason he was 2 and 1/2 years post-accident and making considerable progress in regaining language. However, prior to his injury, he had a diagnosis of "dyslexia" which had already affected his ability to read and write. Consequently, he was overcoming the aphasia and apraxia, but also was still suffering from dyslexia, all of which made written language production very difficult for him. Jason had already had an assistive technology consult elsewhere and had been using word prediction, but with little apparent success or interest. We explored it again, looking at different and newer programs, but found that he frequently lost his train of thought as he coped with the multiple demands of formulating and remembering a sentence, locating the desired key on the keyboard, beginning to spell individual words, locating them in a list, looking back and forth from the keyboard to the monitor, etc. We then presented speech recognition with synthetic speech readback of the text Jason had created. In the very supported examination environment, it worked very well for him; he could keep his attention focused in one place for much of the time, the preferred word choice was usually given first, and so forth. Based on our recommendation, Jason's parents and school district collaborated to purchase a speech recognition system on a notebook computer for him so that he could work at home and at school. At school he did his dictation with his tutor/aide in a resource room, where it was relatively quiet compared to many other environments in the school. The school also placed another speech recognition system in Jason's classroom for other students to get trained on and to use for some writing. As of this year, there are up to five students in the school who are beginning to use speech recognition for writing. Jason graduated last year and has gone on to an art college in another state, where he continues to be a successful and increasingly more independent speech recognition user. * Meet Jason: Before speech recognition, writing was impossible for me to do. My handwriting was very hard for me, or any other person to read. My spelling was terrible. I had good ideas but couldn't get them down on paper. My writing and reading are excellent now. I'm able to get my ideas down on paper but it hasn't helped me in grammar and proofreading. I continue to use speech recognition at home and at school. At home I write email and letters. At school I take notes from a text. I write reports. I do all my work using voice input. It was not hard to learn commands. It was a little frustrating initially because it didn't know my voice. Every time I used it, it got a little easier. The speech recognition has put a more independent and confident edge on my life. Speech recognition got me to do things on my own and I don't need assistance. In college, I will be able to do anything that I want with writing. 2. How can speech recognition benefit students with learning disabilities? Speech recognition technology can benefit students who have learning disabilities that interfere with their ability to spell and write. While many such students benefit from standard word processing, the visual-motor demands of keyboarding can be a major stumbling block that compounds the writing difficulties. Similarly students who are the poorest spellers are frequently unable to effectively use standard spell checkers. For whatever reason, if students' oral language skills far outstrip their ability to generate text with pencil and paper or standard word processing, speech recognition may enable them to become accomplished writers by circumventing the most frustrating aspects of text generation. Take, for example Sara, a 15-year-old sophomore in high school. Sara is a very bright young woman with a learning disability in the area of written language. Like many students with written output difficulties, Sara has the "gift of gab," and readily provides vivid oral descriptions and explanations. Unlike many such students, Sara loves to read and has always been reasonably successful at it. Writing has been a different story for Sara. Her spelling is idiosyncratic at best, and her handwriting is very labored and difficult to read. I first saw Sara as a fifth grader, after her parents had already purchased a computer for her in hopes that it would address her writing difficulties. The purpose of this visit was to address issues about using the computer in school. However, we quickly discovered that Sara was still struggling. As bad as her handwriting was, it was still faster than her ability to use the keyboard, and she did not have the patience to plod along in her "hunt and peck mode." Despite several months of keyboarding instruction in a computer lab at school, Sara was still struggling with learning key locations. The computer provided little support in spelling as well. Her attempted spellings were so discrepant from the correct form that they foiled regular spellcheckers. Despite recommendations for training and support, by the end of fifth grade, Sara was not progressing in using the computer, and was getting ever more discouraged about school. Her preferred mode was to write as little as she could, and if possible, not at all. We decided to launch a series of trial sessions with speech recognition over the summer. Within two sessions, Sara had begun to tell a yarn that would eventually spin out over the summer to a 10 page neighborhood epic. She was very enthusiastic and felt she had found the answer to her problems. Unfortunately, speech recognition systems at that point cost thousands of dollars and required a different computer than Sara had access to at home or at school. Two years passed as Sara became more discouraged about school and recommendations for the system fell on deaf ears at the school department. Eventually Sara's parents were able to secure a system for to use at home. Sara learned quickly and once again her natural writing talents came to the fore. At the end of that year, Sara was one of two school-wide recipients of a coveted creative writing award. * Meet Sara: When I was six years old and in the first grade, my teacher discovered that I was "different" than the other children in my class. I had very weak writing skills and appeared to be lazy and stupid. This is how my teachers described me. Throughout my elementary school years, my teachers tried everything. They tried to improve my penmanship through repetitious teaching and yelling. Even when I was in 3rd grade my printing was worse than a first grader's. I was embarrassed and felt stupid. I was taunted and ridiculed by my classmates because I was in a learning center class and I could not even write cursive let alone print. It was also difficult because my sister and my best friend always made the honor roll. I never made it. Then in 3rd grade my classmates and I began typing class, I was psyched. I knew that if I learned how to type no one would have to see my writing. My hopes were soon shattered when I discovered that no matter how hard I tried I could not type fast at all. I just could not remember the location of the keys. It took me longer to type than it did to write. I was frustrated and felt trapped. It seemed as though I could not accomplish anything. Finally one day my father came home from work and informed me that I would be spending my afternoons after school at Children's Hospital working in some new computer program. The program was called DragonDictate. When I went to Children's I met a wonderful man named Bob Follansbee. He explained to me that I was their guinea pig! Bob wanted to see if a program made for paraplegics would work for L.D. kids. He told me that he would teach me and trained me on the computer. I also could do my homework on it. Now, I thought DragonDictate was going to be easy from the start; I was dead wrong. Learning how to use the program was very fastidious. It required a lot of practice, time and patience. The program had to get use to my voice. This took time because I have a really strong Boston accent. So after wanting to destroy the computer, I finally learned how to use the program to the best of my advantage. My parent's thought that the DragonDictate was so great that they bought the program for our own computer. I began using the program regularly for my homework in the middle of my freshman year in high school. There was a vast improvement in my grades and by the end of that quarter I made honor roll. I was ecstatic, for I had never made honor roll before. My teachers were also very happy in the improvement of my writing skills. I still actively use my speech recognition in the family room in my house for my nightly homework assignments. My writing skills have improved greatly on account of the DragonDictate because I can now easily get my thoughts down on paper and am not limited because of my poor penmanship. 3. If learning disabled students use speech recognition for writing, are they still able to use other methods? Certainly. Usually, learning disabled students who use speech recognition are only able to do so in certain circumstances and therefore must use other methods of writing at other times. However, they often come to view speech recognition as their text- entry method of choice whenever they have a chance. Moreover, for some students, using speech recognition for writing enables them to regain confidence in themselves as writers, and in turn to persevere with other writing methods. Ben, a 17-year-old boy, is a clear example of this phenomenon. He is a very bright youngster who came to our program during fifth grade for an assistive technology consult because of his increasing frustration with difficulties in getting his ideas down in writing. He had used the computer and word processors for a couple of years, and he still was not working at a pace that satisfied him. His parents, who brought him to the session, were very concerned about his frustration and perception of himself as unsuccessful and even incapable. They told me later that Ben, who had always loved school, had grown to dread this daily, negative experience because it reinforced his image of himself as a poor writer. Even at that young age, Ben had even expressed a desire to quit school. We looked at a number of "lower-tech" options, but none of these worked for Ben. He simply could not manage to write efficiently enough, even with the benefit of word prediction. However, when he first tried speech recognition, it was like watching a light go on over his head. He, and his parents, were immediately very excited by the potential they saw in this system, and they went about obtaining a system for Ben on their own that he could use at home. Ben used speech recognition throughout sixth and most of seventh grade. At the end of seventh grade, two critical events occurred: he got a terrible head cold and his voice changed! During this period, the speech recognition software had great difficulty understanding his changed voice and Ben found himself typing lots of corrections for the software. Despite his frustration, Ben learned to type in the process of correcting the software. In fact, during that time, Ben became so proficient with the keyboard that he dropped speech recognition all together. It has now been more than two years since Ben stopped using speech recognition, and he has successfully maintained his transition back to typing. He now attends an academically challenging high school in the Boston area and is doing very well. In his own estimation, Ben thinks he is an "average" writer among his peers. Despite the fact that Ben's family bought the speech recognition system when it was still fairly expensive, they think it was money very well spent. His father said that using speech recognition "saved Ben's life" in the sense that it kept him from giving up in school. * Meet Ben: I began using speech recognition typing in the fall of 1993 as a means of bypassing my problems with manual typing. Before the voice typing came my way, my typing was horrible, leading me to do poorly on assignments involving writing. Writing was slow and frustrating both by hand and on a keyboard. Often, I would have to abridge my creative writing for school, as typing it out was very tedious and difficult. Once I had procured my voice typing system, I was able to write and edit papers with relative ease. This was probably the only thing that kept me afloat through my seventh grade year. When I returned from my summer vacation, my voice had changed significantly enough to set my voice typing askew. Words that I did not want or intend found their way into my papers more and more often, instead of the longer and more tedious voice correction. (No dammit! I said Foxtrot, OSCAR, Romeo. OSCAR!!!) Thus, my eighth grade year became a weaning period from my voice typing. I began to type more papers, often typing and voice typing at the same time. It ultimately became easier for me to type my papers than to "speak" them. I still credit voice activated typing for allowing me to type at a semi-normal speed and accuracy rate. Learning to voice type was extremely easy, although time- consuming. It required a professional trainer to come in and spend about two days, working on training the computer to my voice, as well as instructing me in all the necessary commands. Voice typing allowed me to create longer and more detailed papers, and impacted my spelling and punctuation use. Voice typing was and is, an indispensable tool for the betterment of my typing skills. 4. Is speech recognition appropriate for all students with writing difficulties? No. Speech recognition IS a promising technology, but like all other technological solutions, it is not necessarily appropriate for every student who experiences difficulty with writing. Speech recognition should be part of an overall consideration of assistive technology plan designed to address the student's needs. Using speech recognition requires several different kinds of learning and performance which fall into roughly three categories: (1) remembering and using the program's commands and features; (2) composing ideas in one's head and saying them aloud; and (3) speaking in a way that the software can understand (which includes being able to alter that when the software does not). In determining whether speech recognition is appropriate for an individual - that is, whether they can manage these three areas - one should consider several areas of student ability: Cognitively, students are asked to attend to several tasks at the same time. For example, students must be able to compose orally while operating the system through oral commands. They must be able to tell which aspect of the program is speech recognition and which is word processing. In other words, students will most likely fair better if they are somewhat flexible in their thinking and are able to juggle several tasks at once. In addition, when the program makes an incorrect guess about a word or phrase, the student must decide how to correct the system and avoid the problem again. Linguistically, students must eventually understand the differences between written and spoken forms of language so that they can adopt a more formalistic style of talking for writing. They must be able to dictate and simultaneously monitor both their written language and the software. Continuous speech software places an extra burden on students, in that they must be able to enunciate each word clearly while dictating in a more continuous manner. This is more difficult for most people because of the nature of the oral muscles involved in speaking continuously, but it is particularly difficult for many students with learning disabilities. Academically, students must have sufficient word reading skills to accurately read alternative word lists and distinguish between visually similar words. They must be able to detect when the system makes a mistake. And, they must have sufficient phonetic spelling skills to prompt the system to generate the correct word when it has made a mistake. Again, continuous speech recognition places an extra burden on the student because it is harder to monitor one's performance when the words appear in longer segments (i.e., in entire phrases, fragments, or whole sentences). Behaviorally, students must be motivated to learn the system and improve their writing skills. They must persevere through training and accept that they use a methodology different from the one most of their peers use. If students bring a positive attitude to the process, they can help themselves a great deal. Please visit the Resource Laboratory at the top page of the Speaking to Write website to find a document which describes in greater detail the user characteristics to take into account when considering the use of speech recognition. 5. Do students need to have all the skills you mentioned in the preceding question to be able to use speech recognition? Not necessarily. A wide spectrum of individuals can use speech recognition if enough external support is provided. Most individuals with relative strength in the skill areas mentioned will likely be able to write independently after being properly trained in the use of speech recognition. However, if a student is weak in one or two of these areas, he or she may never become completely independent in using speech recognition, or may become proficient with speech recognition only with more intensive training. In either case, more intensive instructional support may be needed, particularly in the early stages of training and use, but possibly in all situations. Nonetheless, even without complete independence, the improvement in writing outcomes and self-esteem for the student may outweigh the drawbacks, and these decisions would be warranted on an individual basis. The amount of support, or "scaffolding," needed by an individual student to use speech recognition might vary not only by individual abilities, but also by the kind of task involved. If the task is straightforward, less demanding cognitively, and possibly somewhat shorter (e.g., writing single sentences to define vocabulary words, writing in a journal), then less scaffolding may be needed. Longer tasks involving greater complexity or more difficult content would require more scaffolding. In this way, writing by voice is no different from doing so by any other means. 6. How can one best determine whether or not an individual student can use speech recognition? To rule speech recognition in or out, the student must have the opportunity to try speech recognition, perhaps over several sessions. If a school has purchased a system for multiple users, appropriate students can experiment with the approach in this setting. Alternatively, this exploration can be done with the help of an assistive technology evaluation team in a clinical setting or a person who routinely trains users. In either case, be cautious when working with trainers who also sell the software, because their assessment of the student's potential may be colored by their desire to sell the product. 7. Can students with speech impairments use speech recognition? Some students with physical disabilities may also have labored or inconsistent speech. Some students with learning disabilities may have more subtle articulation difficulties, which sometimes only appear in particular contexts. Even though such speech impairments may complicate the picture, they do not necessarily preclude the student's using speech recognition, although they may limit the user to discrete speech recognition systems, which are more readily adapted to impaired speech. One thing that has been true in the past and seems to remain true to this point (late 1999) is that discrete speech recognition tends to be more forgiving than continuous of speech difficulties, and even variations in speech such as accents of second language speakers or even accents of English-speakers that differ greatly from a "typical" middle American accent. The reason that many people with speech difficulties have trouble using continuous speech is that control over articulation of individual words becomes harder when they are imbedded in speech - that is, with words before and after. Therefore, the demands of continuous speech make the clear enunciation of single words harder. Students with speech difficulties will need to spend more time training the software to recognize their voices than students without such impairments. The problem with recognition of speech with accents in continuous speech may be rectified for some by completing more training in the software, but for others it simply does not work. While one might point to the "intensity" of the accent, but there is no clear evidence on this. Having said this, we have personal knowledge of at least one student with significant writing disabilities and mild-moderate articulation problems who likes to write. He is so excited about using continuous speech recognition to get his ideas out that he tolerates a relatively high rate of recognition errors (e.g., 20%, or one word out of every five, wrong), and is willing to spend the time afterwards to make the corrections. 8. Is there any research on the success of speech recognition? Because educational research on the use of speech recognition technology is in its infancy, very few studies exist to date on the possible benefits of this system for students with disabilities. One promising study (Higgins & Zvi, 1995) at California State University at Northridge explored the performance of learning disabled college students using speech recognition technology to complete the university's written proficiency exam. With the use of this innovation, the learning disabled students achieved the same distribution of scores on the exam as their non disabled peers. With a human transcriber's assistance or with no assistance at all, these same learning disabled students' score distribution fell below that of their non disabled peers. Another exploratory study (Wetzel, 1996) focused on a single subject-a sixth grade student with learning disabilities. Wetzel was interested in whether middle school students could learn to use a speech recognition system, in this case IBM VoiceType, and whether this system would enhance their communication skills. Wetzel found that the student was able to learn to use the software, but that difficulties with the system's recognition accuracy and the complexity of editing compromised this student's success. This early research points to some of the difficulties in using this technology with students who have disabilities as well as to the potential benefits. For example, because the technology was developed with adult voice models, the software is not as proficient at recognizing the speech of prepubescent youth. The research also suggests that younger students may struggle to a greater degree with the cognitive demands of composing orally while also giving the computer oral directions. Anecdotal evidence through the Speaking to Write project and listserv, and reports at educational conferences by various practitioners are very promising in terms of the positive effects of speech recognition for students who need it. However, it is clear that implementation is often not easy. Every successful implementation of which we are aware can be contrasted with many unsuccessful stories, and while the source of difficulty(s) can often be identified in these unsuccessful cases, it can not always be addressed (e.g., having a capable trainer available locally). References: Higgins, E.L., & Zvi, J.C. (1995). Assistive technology for postsecondary students with learning disabilities: From research to practice. Annals of Dyslexia, 45: 123-143. Wetzel, K. (1996). Speech-recognizing computers: A written- communication tool for students with learning disabilities? Journal of Learning Disabilities, 29(4): 371-380. TRAINING 1. How important is training for the user? Proper training is critical. A solid training foundation is the key to on-going success with speech recognition for all users regardless of skill or age. There are in fact four aspects of training with speech recognition: developing an individual voice/language file in the software; learning to use the software itself; developing a dictation "style"; and becoming a better writer. First, the speech recognition system itself must be properly trained to recognize the user's voice. This is done through an initial training process called "enrollment," and through proper use of the program subsequently. Rather like a speaker and listener who both know the same language, but have widely differing accents, the software tries to accustom itself to the user's voice. This is so that the software can understand every word the user says, even when it is a word that he or she has never said to the software before. However, this does not mean that the user has to say every word before it can be understood by the system, because a well-trained voice file can understand many new words as well. As we discussed earlier, the software gets accustomed to the user's voice by building an individual model that is established through enrollment and then modified subsequent use. This model helps the software decide what word or phrase to predict and display from the active dictionary with every subsequent user utterance. The better the model, the better the prediction, and if the software is used correctly, prediction improves with increased usage. Therefore, trainers should help students gain a general understanding of how the speech recognition software works, so that they understand the importance of proper usage. Discrete and continuous speech recognition products differ somewhat in their initial training routines: - Discrete speech recognition training involves less initial training and more use of correction strategies during early usage to achieve an accurate voice file. - Continuous speech recognition training is embedded in the longer initial enrollment of the voice required by the software (although recent versions of speech recognition software have greatly reduced this time period), so that somewhat greater accuracy is possible after the initial training period. However, subsequent correction strategies are still very important to building a strong voice file. Two, the student must be trained in all aspects of the system that they need to know. All users, and especially those who are younger, must be properly trained in the process of dictating. Additionally, students must learn how to correct any mismatches between the user's spoken word and the software's predictions. Beyond this, some students may also want to use other command features of speech recognition software that allow for some use of programs without using the keyboard and mouse, such as learning how to spell by voice, giving voice commands to the computer, or even operating the mouse in order to play their favorite game. It is also critical that parents, teachers, tutors, or aides who work most closely with the student when he or she is writing attend and observe some of the initial training. If they (the professionals) have an opportunity to learn to use the system themselves, this can help them gain some insight into the student's needs during use. However, attendance at some of the training is a minimum requirement. Third, the student must learn how to write through speaking. Students are learning a new text input mode (just like keyboarding) that uses the voice, but is not like talking. This requires time. Developing this facility is mostly a matter of practice - just doing it. However, it is most helpful to have some form of instructional support or monitoring available, and even training, during early dictation experiences in order to help with problem solving when recognition errors arise that could be corrected by adopting a slightly different dictation strategy. This latter point is especially important with continuous speech recognition, where there often is not a one-to-one correspondence between words dictated and words predicted. Fourth, and most important, as students gain mastery in using the software, and dictating by voice, they can begin the task of becoming better writers. Students who have struggled with writing do not automatically become accomplished writers with speech recognition, any more than they can become writers without instructional support. They will continue to need help with such skills as idea generation, organization, grammar, and vocabulary. 2. Is training school-age students different from training adults? It may not be, but it probably should be. Current product tutorials and materials developed for speech recognition are designed mostly for adults, to move them toward independence with the system at a relatively fast clip and provide few accommodations for individual differences. Younger students can rapidly become overwhelmed during the training process unless modifications are made. In fact, training goals and methods need to be reconceptualized for students, and a slower, more incremental approach is often more successful with this population. When initiating speech recognition training with students, trainers should consider building knowledge and mastery of the four interrelated aspects of the task mentioned above: building a strong voice/language file through enrollment and correct usage; learning to operate the software; developing a dictation "style" for writing; and learning to write. Maximum support from a teacher (or trainer or other adult) is necessary to address the first two aspects: building a strong voice/language file and learning to operate the software. Students must learn the most efficient ways to use the software for their purposes while they also develop their voice files through correct usage, particularly in terms of making corrections. Close supervision of the process is needed during this phase, which could last up to five hours of total use. During this initial time, the student will also be developing a dictation "style," but this process will probably take somewhat longer, possibly through many hours of use. Close supervision is not needed here, but regular check-ins of diminishing frequency over time to ensure that the student is still using the software to best advantage. Learning to write is obviously a lifelong process. However, training to be a writer should be essential and on-going element. In terms of learning a new input technique like speech recognition, it is important that teachers not assume that the use of speech input serves as a substitute for involvement in regular instruction for writing. The process of writing by voice may be slightly different, as different kinds of errors occur in earlier drafts, but the expected outcomes should eventually be no different. 3. What might a speech recognition training sequence for students look like? The process of teaching students to use speech recognition must be individualized to their own learning needs and style. Remember that we are talking about students who may be skeptical of their own abilities and who may lack experience in writing. However, experience and common sense about teaching suggest that the process should usually adhere to some variation of the following steps: 1. The student observes the evaluator or trainer using some of the basic functions of speech recognition software; dictating in a proper mode, making corrections, including selection of alternative word), spelling to generate additional choices, and so forth. 2. The student undergoes enough initial training so that the software begins to create and individual voice file. For discrete speech, this can occur immediately after initial enrollment (maybe ten minutes). For continuous speech, the initial enrollment and training is somewhat lengthier (up to 30 minutes). 3. The student is prompted to generate a single, simple sentence (e.g., "I like to go snowboarding and skateboarding." This is done with the speech recognition system turned off. The student says the sentence aloud so that the trainer knows what is going to be said. 4. With a word processor appropriate to the student's developmental level and interests, the student begins dictation with the teacher attending to all other operational matters, such as turning the microphone on and off, using the keyboard for corrections, and making alternative selections. The teacher completes all the corrections with the student's input, and helps the student make any necessary adjustments to the dictation style. The goal at this point is to have the student dictate one sentence so that the system is familiar with the words. 5. The student dictates the same sentence once or twice more, which allows him or her to experience a greater level of fluency and try different dictation strategies. 6. The student and trainer decide on a second sentence which uses some of the same words (e.g., "Snowboarding and skateboarding are popular sports."). 7. The student dictates the new sentence and the preceding steps are repeated with the student taking gradual responsibility for operating the system. 8. The student undergoes additional training to facilitate accurate dictation. 4. Once the voice file has been set up, how does the student learn to operate the system? As the student tries new sentences and gradually assumes responsibility for an increasing number of functions in the software, the trainer should carefully introduce him or her to the latter sequence of steps in learning to operate the system should. The following sequence is the one that we use: - dictation only - dictation plus operation of the microphone hotkey (to turn it on and off) - dictation and mic operation plus identification of errors in dictation and selection of words from the list of alternatives - dictation, mic operation, selection plus spelling to train new words or elicit them from the background dictionary - dictation, mic operation, selection, spelling plus alternative models of error correction - dictation, mic operation, selection, spelling, error correction, plus...(At this point, the sequence can be customized even more to the individual student's needs. For example, does he or she need to use voice to control the mouse or access the menus?) IMPLEMENTATION 1. Where should a system reside? At school, at home, or both? Once it has been determined that speech recognition is a good fit for a particular student, the issue of where to place the system arises. Ideally, the system should be placed in the environment where it will be most effective in meeting the student's writing goals. Most secondary students that we know use their speech recognition systems at home. This makes sense for several reasons: students generally have larger blocks of time to write at home than during school; the potential to find a relatively quiet spot for dictation may be greater at home; and students can work at home without concern over how they might appear to others while dictating. On the other hand, many students who will benefit from using speech recognition are not experienced writers; they may need a considerable amount of instructional support while they compose. A tutor working with the student at home can help remedy this problem to some extent, but may rarely be accessible for the entire time during dictation. For this reason use of speech recognition in school is also an important option to consider. Jason used a notebook computer and carried his speech recognition system back and forth from home to school. At school he worked in the resource room while writing or dictating, with his instructional aide nearby to provide any assistance or guidance needed. (Note that, based on the school's experience with Jason, five other students now have access to speech recognition systems that they also use in the resource room.) 2. What about the noise factor at school? Yes, excessive background noise can be a problem. To use speech recognition in the school setting, the student needs a relatively quiet, more or less private place to work. Even though speech recognition is being used in office environments and the software offers a number of internal settings that help control for ambient noise, middle and secondary school students can generate a lot of background noise; school rooms also are often not constructed to minimize environmental noise. Therefore, placement of the computer must be a consideration. Sometimes speech recognition systems are placed in resource rooms, but a corner of the library or a station in the computer lab can also serve the purpose under the right circumstances (e.g., during a quieter, lower use period). One or two students that we are aware of are actually using speech recognition in the classroom. 3. Who should provide ongoing training and technical assistance? At least one person in the educational or home setting should have a deeper technical knowledge of the operations and requirements of the software so that he or she can provide ongoing training and technical assistance to the student without having to depend on outside trainers or consultants indefinitely. 4. How will students and teachers in the school setting react to a student using speech recognition? All teachers and service providers working with the student should have a fundamental understanding of speech recognition and what it does and doesn't provide the student. This became a problem for Sara who used the system at home. When Sara brought in papers that she wrote with speech recognition, the teacher, who assumed that Sara's system would automatically correct any mistakes she made, was surprised to see occasional grammatical errors. Peer acceptance of this technology is generally fairly high, especially if the student is perceived as successful or capable in other ways. However, it is rarely the perceptions of others that matter, but the students' own perceptions of his or her abilities and potential when using the technology. We have not found any single approach that works in this regard, other than general sensitivity to the issue and helping the student find the location in which he or she feels most comfortable working. It is also important to remember that writing is sometimes a private act, where one is exposing one's ideas to review by others. Some students may feel this way more strongly than others. They should not be put in a position where use of speech recognition makes them feel unnecessarily vulnerable. 5. What about ongoing instructional support? We have found that even if students do not need help in producing their first drafts by speech recognition, they usually require some additional support in editing and revising. This is particularly the case with students who have an aversion to writing and therefore have little experience with editing and revision. As a consequence,these students have often missed many of the incremental steps learned about writing in the earlier grades. Therefore, many students who are successfully using speech recognition to create lengthy first drafts of texts (often, for the first time) require help in knowing how to proceed with these texts. Consequently, they should not be cut off from individual instructional support simply because they are using the speech recognition system. 6. What are the implications for the teaching of writing and the curriculum? Teachers may need assistance in thinking through the implications of speech recognition technology for the teaching of writing at the classroom level and for writing as it is integrated throughout the curriculum. With careful planning, speech recognition can be used to facilitate various stages of the writing process, (i.e. brainstorming, outlining, drafting, revising, editing, publishing). Speech recognition software generally provides a means of segmenting the vocabulary so that the system can be fine- tuned for specific writing assignments in various subjects such as history or science. For more information, contact: Speaking to Write: Realizing the Potential of Speech Recognition for Secondary Students with Disabilities. Education Development Center, Inc. (EDC) Attention: Lucy Lorin 55 Chapel Street Newton, MA 02458 Phone: 617-969-7100 ext. 2111 Email: spk2wrt@edc.org http://www.edc.org/spk2wrt The Spotlight on Speech Recognition (also referred to as voice recognition) was originally developed by the National Center to Improve Practice (NCIP) in the Spring of 1997. This version was last updated by the Speaking to Write project on November 18, 1999. Should you have questions, experiences, or perspectives pertaining to voice recognition after reading the Spotlight, please share them by joining the current discussion on the spk2wrt listserv (information about subscribing can be found on this Web site). With your help, we can continue to build and share knowledge about the ins and outs of using speech recognition technology with students who struggle with writing. The Speaking to Write project is funded by the U.S. Department of Education, National Institute on Disability and Rehabilitation Research. Contract #HI33G70143.