My experience with speech recognition, including a speech (non)recognition dictionary


View content online at: http://www.appliedradiology.com/Issues/2002/12/Articles/My-experience-with-speech-recognition,-including-a-speech-(non)recognition-dictionary.aspx

Abstract:  The author describes his experience with speech recognition software and suggests ways in which other radiologists can integrate this still developing technology into their routine practices.
Loading...

Dr. Tobin is an Attending Radiologist at the Metropolitan Hospital Center, New York, NY.

As a radiologist practicing in a busy inner-city hospital, I would like to describe my experience with the technology of speech recognition software and suggest ways in which other radiologists can integrate this still developing technology into their routine practices.

Speech recognition software automatically transcribes words spoken through a microphone into computer-generated text. This software has been advocated as a replacement for human transcriptionists. However, I have found that the word-recognition error rate associated with this technology results in a significant additional amount of time that must be devoted to editing reports manually.

This article is for the benefit of my fellow radiologists who are destined, through the use of speech recognition software, to become typists, proofreaders, editors, and word processors in addition to less productive image interpreters. I created a "(non)recognition dictionary" (Table 1) as a guide for those clinicians who must now read radiology reports that were computer-generated by my speaking into a microphone.

Things were not always so. Formerly, the radiologist would dictate a report by telephone into a radiology information system (RIS); subsequently, a typist would listen to the recording and type up a report as a permanent record. As pointed out by detractors of this method, a signed, written report could take 24 to 72 hours to generate. The fact is, however, that once dictated, results would immediately be available to clinicians by telephone and, of course, the radiologist interpreting the images would be available as well.

Now, the typists are gone and the radiologist must function as transcriber and proofreader, without delaying image interpretation. If the speech recognition software were able to transcribe a radiologist's reports as well as a typist, all would be well. But in my experience, this has not been the case. Unlike a typist who becomes accustomed to my voice and the words I use, I have found that speech recognition does not.

Further, correcting these software-generated reports is not a trivial process. This is because speech recognition errors are not misspellings, which are easy to detect, but rather, inappropriate word substitutions, which can be surprisingly difficult to identify, even during careful re-reading. These mistakes of word substitution are often amusing, as seen in Table 1 , but if left uncorrected these mistakes can change the meaning of a report.

Using speech recognition

When you start using speech recognition software, you first "train" it to your voice by reading one or more selections offered by the program. Later, if the program types a different word from the one you said, you can --at least in theory--train the software by typing the misspelled word and then pronouncing it clearly when prompted by the computer.

As soon as I started working with speech recognition, I found there were many words I used that the software failed to recognize. Of course, the software didn't "know" that it "misunderstood" me, and so it typed what it "thought" I said, no matter how ridiculous or misleading.

The first thing I have to do is read my dictation very carefully and try to identify misrecognized words. As my (non)recognition dictionary shows, these incorrect words are often homonyms of the dictated word (Table 1). I then attempt to correct these errors by redictating each misunderstood word, paying careful attention to my pronunciation, thus giving the system a second chance to get it right. Sometimes, the software rises to the challenge and types the correct word; but other times, it does not. For example, the speech recognition software continually refuses to type the word phalanges , insisting instead on phalanx east or phalanx cheese . Similarly, no matter how carefully I pronounce the word hypo , the software types high pole .

Continually misinterpreted words, such as phalanges , require additional "training." I did this by invoking the software's dictionary and recording the pronunciation of such words next to their typed equivalents. However, with the software package we use, changes made to the dictionary are not permanent until one exits the program. Thus, if the program crashes during dictation, all corrections made to the dictionary are lost and have to be repeated.

Training the software to recognize my pronunciation is not always successful. After multiple trainings and continual use, the speech recognition software still insists on typing PA instead of ending a sentence with a period . I have also had no success teaching the program to type ultrasound ; I now use the word sonography . In a real sense, the speech recognition software has trained me to speak, rather than I having trained it.

Part of the problem, we are told, is that the currently available speech recognition software is not "context-sensitive." In other words, the software doesn't know which words make sense in the contents of the radiology report. For example, if the speech recognition software were more context-sensitive, then, presumably, it would not type Clinton instead of colon in a report describing an abdominal radiograph of a patient filled with abundant feces.

Transcriptionists would be unlikely to confuse Clinton and colon . Their errors are usually spelling misteaks...er, mistakes..., with the meaning of the misspelled word evident to the reader. In my experience, the best transcriptionists question possible inconsistencies in a report. Human beings have the ability to make judgments, whereas computer software does not.

Also, transcriptionists will not type words or sentences that sound garbled to them, preferring to insert question marks. When words "sound garbled" to speech recognition software, it will type gibberish, which is harder to find and, therefore, easier to miss during proofreading.

After more than a year of continuous use, I can report that, because of erroneous word substitution, speech recognition slows down my rate of dictations, thereby decreasing my productivity. I would estimate my productivity loss is 20% to 25%, depending on the imaging modality and the complexity of the report. Normal chest radiographs, eg, are quick dictations that can be saved as "normals" both by speech recognition and by transcriptionists. Little time is gained or lost on these. Other examinations, such as breast ultrasounds (I mean, sonograms), are more difficult for me to dictate with speech recognition when there are multiple cysts/solid areas in each breast, each of which I try to characterize by size, shape, position, orientation, echotexture, and the like.

Recommendations

Until speech recognition becomes more accurate, I suggest that radiologists required to use it do the following:

1. Insist on being involved in planning and implementing speech recognition. We all feel better if we have some control over what happens to us. The more we are included in the decision-making process, the less likely we are to feel resentful. Having a good attitude about speech recognition is important because, as I have stressed, the technology is far from perfect.

2. Insist on the latest software. The current implementation of speech recognition is "continuous," meaning that you can dictate reports at your normal rate of talking. This technology replaces "discrete" speech software, which required pausing between words. There are also bug fixes and small upgrades that are offered from time to time.

3. Insist on sufficiently powerful hardware. Current speech recognition software benefits from powerful processors and large amounts of random access memory (RAM).

4. Insist on a reliable vendor who can install and maintain a complex system. Issues such as bandwidth and integration of speech-recognition­generated reports with the RIS, the hospital information system (HIS), and the picture archiving and communication system (PACS) require specialized knowledge. It is much more involved than installing speech recognition software on your laptop at home. Also, you really need technical-support people who know what to do when things go wrong.

5. Insist on training. Speech recognition takes getting used to, and good training can really jump-start a new user. People differ widely in their capacity to adjust to speech recognition and some people need extra help. No one should have to enter into this technology cold.

6. Insist on a quiet place to dictate. I don't care what anyone has to say about this one. Noise, if loud enough, will wind up on your computer screen as gibberish. As human beings, we can focus on a single conversation amid a sea of noise and other conversations. Software is much less discriminating. Noise cancellation microphones are a real advance.

7. Compose your thoughts prior to dictating. This will help to eliminate the hesitations, ie, the "errs" and "ems," that are common in everyday speech. Put bluntly, you cannot "think" into the microphone, which may be a major change for many of us.

8. Keep your dictations concise. The more you say, the more errors the software can make.

9. Create easily modifiable report templates, if your software allows it. This can be a real time-saver, similar to the "standard normals" we used with the transcriptionists. The most satisfied users, in my experience, are those who create a lot of macros to cover many different normal and abnormal findings and then modify them slightly for individual dictations. However, the number of macros quickly rises as you cover more modalities and as your studies and your patients become more complex. Then you are faced with multiple searches for multiple macros, and the issues of naming and organizing your macros become major. You'll see what I mean!

10. Finally, be realistic about the limitations of speech recognition software. It is a small miracle that it works at all.

Conclusion

Based on my experience as a radiologist and a long-time computer user, I find speech recognition software, at this point in time, to be error-prone, often leading to time-wasting, if humorous, word substitutions, some of which change the meaning of my reports. Proofreading reports generated by speech recognition is fatiguing, can miss errors (leading to inaccurate reports), and distracts my focus from image interpretation.

Correcting speech recognition errors and training the software to recognize words is frustrating and tedious. It is an ongoing process and not something that ends a week or two after initial use. Some words require multiple training sessions, and other words, despite my attempts at training, remain unrecognizable to my software. Over time, I have learned which words and combination of words to avoid. I have not yet reached the extreme of one of my colleagues who has tried to reduce the frequency of nonrecognition errors by laboriously deleting all the words in the application's dictionary that he felt he was unlikely to use.

Because of current deficiencies in speech recognition software, I am less productive in generating reports. Although using standard, predictated reports (macros) and a shortened, more concise style of dictation have been helpful, speech recognition remains inefficient for me. Sometimes my frustration level with the software becomes so high that I resort to typing part of my reports, which is an option within the application.

Although touted as cost-saving as compared with a typing service, the true cost of speech recognition (setting aside the fixed cost of hardware, software, and networking) must include the loss of radiologist productivity and the cost of additional technical support required to keep the speech recognition system functioning. Since our administration introduced PACS and speech recognition, several computer specialists have been added to the payroll and others have been re-assigned to application support.

There are hidden costs as well because the speech recognition technology competes for dollars with other capital projects. Some radiology administrators have confided to me that they are anxious to adopt speech recognition because they believe it will increase their bottom line but are reluctant to make other needed capital purchases that might not. The irony of producing reports in record turn-around time of studies generated on technologically dated imaging systems belies the argument that speech recognition is being implemented primarily for patient care.

Speech recognition is the wave of the future because patient records will, undoubtedly, become digital. So, all physicians, not just radiologists, will be typing, dictating, or using some combination of these and other digital input technologies, as the electronic patient record replaces the thick multivolume patient chart, which is sometimes difficult to locate and occasionally challenging to decipher.

There are those who believe that radiologists will eventually generate reports with a "structured reporting" system, in which they select findings from a list of possibilities contained in a standard lexicon, with the computer generating the report. There are authors (eg, Langlotz) who claim that "structured reporting" can be faster than either conventional dictation or speech recognition. For more information on and insight into this subject, you may refer to the list of suggested readings at the end of this article.

In the early 1990s, I purchased a program called VoRecOne (Impulse, Inc., Minneapolis, MN) for the Amiga computer. When I found that the program had difficulty distinguishing hello from help , I decided that there were other, more promising applications to try. Since that time, speech recognition technology has made remarkable progress, and there is every hope that with faster computers and more sophisticated algorithms, the technology will become context-sensitive and the software will "know" that feet have five toes and not five towels . That time has not yet arrived.

In the meantime, I hope you enjoyed the misinterpretations our speech software has provided, and I encourage you to make your own dictionary. AR

suggested readings

Gale B, Safriel Y, Lukban A, et al. Radiology report production times: Voice recognition vs. Transcription. Radiol Manage . 2001;23:18-22.

Hayt DB, Alexander S. The pros and cons of implementing PACS and speech recognition systems. J Digital Imaging . 2001;14:149-157.

Heilman RS. Voice recognition transcription: Surely the future but is it ready [editorial]? RadioGraphics. 1999;19:2.

Klevans RL, Rodman RD. Voice Recognition. Norwood, MA: Artech House, Inc.; 1997.

Langlotz CP, Meininger L. Enhancing the expressiveness and usability of structured image reporting systems. Proc AMIA Symp . 2000;467-471.

Ramaswarmy MR, Chaljub G, Esch O, et al. Continuous speech recognition in MR imaging reporting: Advantages, disadvantages, and impact. AJR Am J Roentgenol . 2000;174:617-622.

Rodman RD. Computer Speech Technology. Norwood, MA: Artech House, Inc.; 1999.