Computer with Microphone Icon  Speaking to Write: Realizing the Potential of Speech Recognition for Secondary Students with Disabilities
 About Us  DiscussionBlank gifSpotlight Resource Laboratory Links

Update on Continuous Speech

More information is available in the article, An Overview of Speech Recognition, Winter, 2002 posted in Resource Lab (http://www.edc.org/spk2wrt/lab.html)

The current situation with speech recognition. Generally, when people today speak of "speech recognition" they are referring to continuous speech recognition, so named because of its ability to interpret continuous speech - in other words, speech as we use it for talking. Continuous speech recognition first appeared commercially in 1997 and has been characterized by frequent changes in software versions. The primary products now available (Winter, 2002) are various editions of Dragon NaturallySpeaking (now owned by Scansoft) and IBM ViaVoice. Dragon products operate in Windows only, while ViaVoice offers versions for both Windows and the Macintosh. There is also a new (2001) Macintosh-only product, iListen, by MacSpeech. (The older speech recognition technology, discrete speech recognition, represented by DragonDictate, is less readily available now but is still preferable for some individuals - see below). Product information can be obtained at the following sites:

  1. For Dragon NaturallySpeaking: http://www.lhsl.com/naturallyspeaking/ (this address will certainly change after the purchase of Dragon from L&H is completed)
  2. For IBM ViaVoice: http://www-4.ibm.com/software/speech/desktop/
  3. For MacSpeech iListen: http://www.macspeech.com/products/iListen.html

The promise and the reality. Continuous speech recognition is very appealing because, based on its claims, one can speak to the computer in a natural voice and at a normal rate of speech. This is a fine example of marketing hyperbole that glosses over important aspects of how one learns to use the software. Failure to understand the training and usage requirements of speech recognition products often leads to unsuccessful experiences with this technology. Additional unsuccessful experiences are not what you want for your students with writing disabilities.

On the other hand, with the proper introduction to and training on the technology, and with reasonable expectations, speech recognition software can operate with remarkable ease and accuracy, and can be a tremendous boon to some students who might otherwise never have a successful writing experience. I will discuss the introduction and training needs below.

Our recommendations. There are important differences for individuals and especially children and students with disabilities in the operation of the various continuous speech products, so knowledge of these can be important. A few pointers are listed below.

All other things being equal, Dragon NaturallySpeaking Preferred is our recommended system for students with LD in schools. The Preferred version is recommended over less expensive versions of NaturallySpeaking because it includes several features that can be of particular benefit to students with learning disabilities: digital playback of the user's voice (what the user actually said) and synthesized speech readback of the text (what the software actually put on the page). Students with reading difficulties can use these features to check for errors. All versions of NaturallySpeaking also include other features that can be important for students with LD: dynamic updating of the words in the correction window, easy management of the correction window, and presence of an arrow to signal location (like a bouncing ball) during digital readback. NaturallySpeaking versions 4 and above include voice models for adolescents (from a discontinued product, NaturallySpeaking for Teens).

According to most reviews (and the author's personal experience) IBM ViaVoice is equal to the Dragon products in accuracy for most adult and adolescent users. In our experience, it has even been slightly more accurate than Dragon products during initial usage with several adolescent users, but with proper training these differences became insignificant. ViaVoice also includes synthesized speech readback of the text to see what the software actually put on the page, but has more limited digital readback of the speaker's voice. On the whole, we believe that the interface is somewhat less effective than that of the Dragon products for some students with LD, as management of the correction window and cursor location seem more intrusive. These characteristics might not be a problem for many users. Certainly, if students need a Macintosh product, ViaVoice is a fine alternative.

MacSpeech iListen is a Macintosh-only alternative. The author has only used an early release version that seemed promising, but was not a complete product. Several members of the Speak to Write listserv (www.edc.org/spk2wrt/hypermail) report that the product performed well with middle-school and older students, and compared it favorably to ViaVoice for the Macintosh. It is not clear whether the final version has text-to-speech readback or not.

Introduction and training. Learning to use speech recognition effectively requires four things.

  1. First, the new user must train the software to recognize his/her voice through an initial enrollment process that can be demanding. Then the user must learn to:
  2. Speak so that the software can understand what is said. This IS NOT the same as speaking in conversation, although some individuals' voices and speaking styles may adapt more readily to the stylistic requirements of speech recognition. This process will require some monitoring by a knowledgeable support person and some time spent practicing.
  3. Operate the software. This is especially important in learning to make corrections through the software so that the software learns the user's voice better.
  4. Compose through a new medium. Composing via speech is different from doing so through a pencil or the keyboard.

The most common complaint encountered in speech recognition implementation is that the software "just doesn't work" when the student talks to it. Digging below the surface of such complaints often demonstrates that the student and adult supporter have not really understood some of the basic requirements for training the software and have expectations of ease that exceed the capacities of the technology. Users must remember that the computer does not have any comprehension of language but is operating behind the scenes through a series of mathematical algorithms. In the initial use of the software the user is developing a "voice file." This voice file is the software's matching of the individual user's voice to the algorithms that it already has. It is essential that the user learn both how to speak to the software to maximize its understanding (#2 above), and also how to make corrections the proper way to provide the software voice file with useful information about its recognition errors (#3). Proper training and patience is often the solution.

For successful implementation of speech recognition, students, staff, and schools have specific responsibilities. Schools must provide:

  • Training in implementation for staff. This implies not only actual workshop time, but also supported practice time for the staff.
  • Consulting support for staff and students as needed during implementation with students.
  • Adequate hardware and technical support for hardware problems, software installation, etc.
  • Space for use of speech recognition. This technology does not require absolute silence, and can be used with considerable background noise if set-up properly. However, some environments are very difficult to accommodate. A typically problematic space is the kind often encountered in older school buildings: high ceilings with hard surfaces (tile, plaster, etc.) everywhere and no acoustic absorption. Finding smaller spaces or area adjustments (e.g., a carpeted corner, use of a carrel, etc.) can help with this.
  • Space considerations for speech recognition is also important because the act of composing is often a private matter and students may feel awkward writing outloud in front of others.
  • Time for staff to work with the student during initial stages of speech recognition use. Students need the most support when they are first using the software, and staff should have some leeway to provide this.
  • Academic (substitute) credit for students who learn to use speech recognition. Rather than adding an extra to the already over-burdened student, learning to use speech recognition might take the place of part of a class in computer literacy or be integrated into requirements of an English/writing class.

This article was written February, 2002 by Dr. Follansbee


This Web site was funded from 1997-2001 by the U.S. Department of Education, National Institute on Disability and Rehabilitation Research (NIDRR). Contract #HI33G70143. The views expressed within this site do not necessarily reflect the views of the Government. Site hosted by Education Development Center, Inc. ©2000 Education Development Center, Inc. All Rights Reserved.


Material on this site is no longer updated. Final update 2/02.