Speech recognition: Evaluation, implementation, and use

By David L. Weiss, MD
pdf path

Image Gallery

Dr. Weiss is the Clinical Section Head for Imaging Informatics at Geisinger Health System, Danville, PA. He is also a member of the editorial board of this journal.

Currently, there are a handful of ways to create a radiology report. For decades, the standard has simply been transcription, coupled in more recent years with digital dictation. Another option, structured reporting, is used by many radiologists in mammography but is not widespread in general radiology. This article will focus on speech recognition, which is being adopted by more and more radiology departments, in both academic medical centers and private practice.

One of the purported advantages of speech recognition is cost savings. In reality, a speech recognition system may, at least in part, shift costs rather than save them. This is because instead of paying a transcriptionist, a radiologist spends time editing and typing. However, without question, improved turnaround time is an advantage of speech recognition, as has been documented in a number of studies. 1,2

One of the disadvantages of speech recognition is a time penalty. A majority of radiologists report spending more time creating and finalizing reports using speech recognition as compared with conventional dictation, with a resulting decrease in overall radiologist productivity. 3,4 However, the most serious problem with speech recognition is its potential to distract the radiologist from viewing images. If the radiologist’s eyes are on the dictation screen rather than on images, the risk of error increases. 5

Acceptance of speech recognition by radiologists has been complicated by a misalignment of incentives. Radiologists only indirectly benefit from the advantages of speech recognition. If cost savings help the department to stay under budget, radiologists might receive a bonus, but it will be slow in coming and will be shared by many others. Similarly, improved turnaround time will help to achieve departmental goals, but diagnostic accuracy and productivity are more important to most radiologists.

On the other hand, the disadvantages of speech recognition fall directly on radiologists, as they suffer the potential for a time penalty, productivity decrease, and distraction from image viewing. Administrators and information technology staff must pay attention to this misalignment of incentives and find a solution to it if they want radiologists to adopt speech recognition.

Case studies

The following case studies illustrate both the success and, in some ways, the failure of speech recognition.

Community hospital

The first case study involves Chestnut Hill Hospital, a small community hospital in Philadelphia, PA, with a general radiology practice of 4 to 5 radiologists reading 100,000 examinations annually. In 1998, we installed speech recognition, shortly after hardware and software advances made it practical for radiology use. The next year we installed a picture archiving and communications system (PACS) and, in early 2000, we integrated the 2 systems.

All of the radiologists agreed to implement speech recognition. We discontinued using a transcriptionist after about a week. What followed was 4 weeks of difficulty. First, we ordered the wrong microphones for data entry (ones without a barcoder). We weren’t proficient in using the product as it was designed. Macro techniques were still in their infancy. The navigation controllers we use today were not available. And, initially, there was no integration between the PACS and speech recognition.

Nonetheless, report turnaround time decreased from approximately 72 hours to 20 to 24 hours immediately after implementation of speech recognition (Figure 1). There was no further significant reduction in turnaround time during the first 6 months after installation of the PACS. However, after we streamlined workflow to make the best use of the PACS, we did see a major drop in turnaround time to an average of <4 hours for all studies.

A major change in workflow involved the radiologists’ work schedules. We were accustomed to leaving the hospital at 5 pm each day, when the film library stopped distributing hardcopy studies to be read. Several months after installation of PACS we realized that, since PACS produced images hour after hour, we could reconfigure our work schedule. All but one radiologist began to leave the hospital at 4 pm, while the on-call radiologist stayed until 7 pm. This new schedule consisted of the same total number of work hours, but resulted in much better turnaround time.

Large medical center

At Geisinger Medical Center, we have digital dictation and use structured reporting for mammography. We installed speech recognition in the third quarter of 2004, and radiologists were encouraged, rather than given a mandate, to use it. Therefore, at each workstation we still have both a speech recognition microphone and software, and a conventional digital dictation system.

An interesting pattern developed. After 6 months, half of our radiologists were using speech recognition 80% to 90% of the time, and the other half were using it ≤30% of the time (Figure 2). This was a bit surprising. It is more common to see a bell-shaped usage curve, with most of the radiologists accepting speech recognition, a handful really embracing it, and a handful really struggling with it.

What happened at Geisinger? We did get buy-in from the radiologists initially, but we may have had unrealistic expectations of accuracy that led to frustration with the product after implementation. A training session with a trainer who knows the software inside and out- and software that has become highly accurate in recognizing that trainer’s speech-is far different from a new user’s initial experience. In addition to problems with the speech engine, we also had some disruption attributable to the PACS and the radiology information system (RIS).

These were not the real reasons speech recognition was not widely used at Geisinger, however. The main problem was that we did not communicate a consistent message that the department would be adopting this technology. We also had no defined endpoint for eliminating digital dictation and have continued to support 2 separate reporting systems.

Ensuring success

Table 1 outlines some of the steps that can be taken to ensure adoption of speech recognition. First, provide meaningful incentives to users. The incentives could include bonuses, extra time off, or a reduction in productivity requirements for radiologists who use speech recognition. Don’t allow use of the system to be voluntary.

Have realistic expectations and plan for a drop in productivity. Most studies show at least a 10% reduction in radiologist productivity after implementation of speech recognition. There are some exceptions, however. For example, at the community hospital profiled earlier, my colleagues and I all felt strongly that we were more efficient when using the integrated PACS-speech recognition product.

The University of Pittsburgh showed at least a time-neutral effect after installation of speech recognition using a hybrid transcriptionist-radiologist editing process and some modifications in workflow. 6 Massachusetts General Hospital (MGH) recently did a productivity study of PACS and speech recognition in collaboration with New York University. They were able to show that at MGH the use of PACS had a positive impact on radiologist productivity, while speech recognition had no statistically significant effect on productivity. 7

Plan for training and ongoing support. Even experienced users will have problems now and then. When new radiologists join the department or locum tenens radiologists arrive, they will need support on the system.

Set a deadline for the removal of conventional transcription and transition to speech recognition. Consider using a hybrid model for report editing. One way to use speech recognition is to have radiologists dictate, edit, correct, and sign their own reports. Another way is to have radiologists dictate reports, and then use “back-end” transcription for editing. In this process a transcriptionist listens to an audio file while viewing the text and making corrections. Although this approach relieves radiologists of clerical work, it reduces cost savings and results in variable turnaround times. Consider combining these 2 approaches, so that radiologists who are proficient with speech recognition can work independently and those who are struggling or in a time bind can send reports to back-end transcription.

Christiana Care Health System in Wilmington, DE, provides an example of a hybrid model for report editing. All radiologists use speech recognition but are given the choice of self-editing or back-end transcription. Approximately half of the radiologists choose to use self-editing 80% to 90% of the time, and about half of the radiologists send their reports to back-end transcription. 8

Take steps to maximize efficiency. With speech recognition, efficiency is determined by accuracy, navigation, integration, and macro use. To maximize accuracy, it is best to use a headset microphone. This will improve accuracy by standardizing the distance and the position of the microphone in relation to the mouth. With a hand-held microphone, approximately half of the errors can be attributed to not holding the microphone in the correct position.

Make corrections properly, either using the correction mode or, in the case of some speech recognition systems, the vocabulary editor. This is a way of training the system to learn each user’s specific speech patterns.

Pay close attention to ambient noise. It may be helpful to have the walls of the reading room covered in acoustic paneling and to have acoustic tile or carpeting installed on the floor. If ambient noise remains a problem, consider putting in a fan or some device that generates “white noise.”

Speech recognition requires not just dictation but navigation through the text and various screens of the speech recognition system, as well as simultaneous navigation through the PACS. It is important to consider how to make this process seamless through use of a mouse, a microphone with programmable buttons, or some other navigation aid.

Integration is another critical factor in efficiency. It is fairly easy to use a stand-alone speech recognition system. It is much more difficult to achieve interoperability between the speech recognition system and the PACS or RIS. The interface with the RIS needs to be bidirectional, enabling the accession number to pass from the RIS to the speech recognition system, and in the other direction, for text and any other information to pass back to the RIS.

Even simple integration algorithms can eliminate unnecessary tasks. Without them, it is necessary to first open a case in the PACS, then open dictation in speech recognition, and then add demographic information. Integration enables distillation of the workflow into simply viewing images, dictating, and signing the report. The case is closed automatically, the next case opens automatically, and the radiologist views the images, dictates, and signs the report.

Macros and templates are essentially canned reports that the radiologist can pull up anytime and modify. With speech recognition, the use of macros magnifies the time savings. Not only does it save time in dictation, it saves time in proofreading.

Figure 3 shows the modification of a macro for a CT scan of the abdomen and pelvis. The macro states: “Aorta is normal in size.” To modify that text, the user verbally selects the sentence and substitutes: “There is an aneurysm of the lower abdominal aorta measuring 4.8 centimeter in diameter.” The user also changes the impression from: “No significant abnormality in CT scan of abdomen and pelvis” to “Abdominal aortic aneurysm. No other significant abnormality….”

All of the radiologist’s edits are done verbally, without the eyes ever leaving the images. Not only is this method much faster than dictating an entire report, the radiologist has only 2 phrases to proofread.


Effective use of speech recognition can yield major improvements in report turnaround time. This technology is not always well accepted by radiologists, however, in part because it can reduce productivity, at least initially.

To encourage radiologists to adopt speech recognition, it is essential to offer meaningful incentives. It is also helpful to identify departmental champions who can generate excitement about the new technology, to set a firm date for discontinuing conventional transcription, and to maximize efficiency by improving accuracy, streamlining navigation, integrating speech recognition with the PACS and RIS, and taking advantage of macro functionality.

Back To Top

Speech recognition: Evaluation, implementation, and use.  Appl Radiol. 

December 14, 2008
Categories:  Imaging Informatics

Copyright © Anderson Publishing 2020