is the Clinical Section Head for Imaging Informatics at Geisinger
Health System, Danville, PA. He is also a member of the editorial
board of this journal.
Currently, there are a handful of ways to create a radiology
report. For decades, the standard has simply been transcription,
coupled in more recent years with digital dictation. Another
option, structured reporting, is used by many radiologists in
mammography but is not widespread in general radiology. This
article will focus on speech recognition, which is being adopted by
more and more radiology departments, in both academic medical
centers and private practice.
One of the purported advantages of speech recognition is cost
savings. In reality, a speech recognition system may, at least in
part, shift costs rather than save them. This is because instead of
paying a transcriptionist, a radiologist spends time editing and
typing. However, without question, improved turnaround time is an
advantage of speech recognition, as has been documented in a number
One of the disadvantages of speech recognition is a time
penalty. A majority of radiologists report spending more time
creating and finalizing reports using speech recognition
as compared with conventional dictation, with a resulting decrease
in overall radiologist productivity.
However, the most serious problem with speech recognition is its
potential to distract the radiologist from viewing images. If the
radiologist’s eyes are on the dictation screen rather than on
images, the risk of error increases.
Acceptance of speech recognition by radiologists has been
complicated by a misalignment of incentives. Radiologists only
indirectly benefit from the advantages of speech
recognition. If cost savings help the department to stay under
budget, radiologists might receive a bonus, but it will be slow in
coming and will be shared by many others. Similarly, improved
turnaround time will help to achieve departmental goals, but
diagnostic accuracy and productivity are more important to most
On the other hand, the disadvantages of speech recognition fall
directly on radiologists, as they suffer the potential for a time
penalty, productivity decrease, and distraction from image viewing.
Administrators and information technology staff must pay attention
to this misalignment of incentives and find a solution to
it if they want radiologists to adopt speech recognition.
The following case studies illustrate both the success and, in
some ways, the failure of speech recognition.
The first case study involves Chestnut Hill Hospital,
a small community hospital in Philadelphia, PA, with a general
radiology practice of 4 to 5 radiologists reading 100,000
examinations annually. In 1998, we installed speech recognition,
shortly after hardware and software advances made it practical for
radiology use. The next year we installed a picture archiving and
communications system (PACS) and, in early 2000, we integrated the
All of the radiologists agreed to implement speech recognition.
We discontinued using a transcriptionist after about a week. What
followed was 4 weeks of difficulty. First, we ordered the
wrong microphones for data entry (ones without a barcoder). We
weren’t proficient in using the product as it was
designed. Macro techniques were still in their infancy. The
navigation controllers we use today were not available. And,
initially, there was no integration between the PACS and speech
Nonetheless, report turnaround time decreased from approximately
72 hours to 20 to 24 hours immediately after implementation of
speech recognition (Figure 1). There was no further
significant reduction in turnaround time during the
first 6 months after installation of the PACS. However,
after we streamlined workflow to make the best use of the
PACS, we did see a major drop in turnaround time to an average of
<4 hours for all studies.
A major change in workflow involved the radiologists’
work schedules. We were accustomed to leaving the hospital at 5 pm
each day, when the film library stopped distributing
hardcopy studies to be read. Several months after installation of
PACS we realized that, since PACS produced images hour after hour,
we could reconfigure our work schedule. All but one
radiologist began to leave the hospital at 4 pm, while the on-call
radiologist stayed until 7 pm. This new schedule consisted of the
same total number of work hours, but resulted in much better
Large medical center
At Geisinger Medical Center, we have digital dictation and use
structured reporting for mammography. We installed speech
recognition in the third quarter of 2004, and radiologists were
encouraged, rather than given a mandate, to use it. Therefore, at
each workstation we still have both a speech recognition microphone
and software, and a conventional digital dictation system.
An interesting pattern developed. After 6 months, half of our
radiologists were using speech recognition 80% to 90% of the time,
and the other half were using it ≤30% of the time (Figure 2). This
was a bit surprising. It is more common to see a bell-shaped usage
curve, with most of the radiologists accepting speech recognition,
a handful really embracing it, and a handful really struggling with
What happened at Geisinger? We did get buy-in from the
radiologists initially, but we may have had unrealistic
expectations of accuracy that led to frustration with the product
after implementation. A training session with a trainer who knows
the software inside and out- and software that has become highly
accurate in recognizing that trainer’s speech-is far different from
a new user’s initial experience. In addition to problems with the
speech engine, we also had some disruption attributable to the PACS
and the radiology information system (RIS).
These were not the real reasons speech recognition was not
widely used at Geisinger, however. The main problem was that we did
not communicate a consistent message that the department would be
adopting this technology. We also had no defined endpoint
for eliminating digital dictation and have continued to support 2
separate reporting systems.
Table 1 outlines some of the steps that can be taken to ensure
adoption of speech recognition. First, provide meaningful
incentives to users. The incentives could include bonuses, extra
time off, or a reduction in productivity requirements for
radiologists who use speech recognition. Don’t allow use of the
system to be voluntary.
Have realistic expectations and plan for a drop in productivity.
Most studies show at least a 10% reduction in radiologist
productivity after implementation of speech recognition. There are
some exceptions, however. For example, at the community hospital
profiled earlier, my colleagues and I all felt strongly
that we were more efficient when using the integrated
PACS-speech recognition product.
The University of Pittsburgh showed at least a time-neutral
effect after installation of speech recognition using a hybrid
transcriptionist-radiologist editing process and some
modifications in workflow.
Massachusetts General Hospital (MGH) recently did a productivity
study of PACS and speech recognition in collaboration with New York
University. They were able to show that at MGH the use of PACS had
a positive impact on radiologist productivity, while speech
recognition had no statistically significant effect on
Plan for training and ongoing support. Even experienced users
will have problems now and then. When new radiologists join the
department or locum tenens radiologists arrive, they will need
support on the system.
Set a deadline for the removal of conventional transcription and
transition to speech recognition. Consider using a hybrid model for
report editing. One way to use speech recognition is to have
radiologists dictate, edit, correct, and sign their own reports.
Another way is to have radiologists dictate reports, and then use
“back-end” transcription for editing. In this process a
transcriptionist listens to an audio file while viewing
the text and making corrections. Although this approach relieves
radiologists of clerical work, it reduces cost savings and results
in variable turnaround times. Consider combining these 2
approaches, so that radiologists who are proficient with
speech recognition can work independently and those who are
struggling or in a time bind can send reports to back-end
Christiana Care Health System in Wilmington, DE, provides an
example of a hybrid model for report editing. All radiologists use
speech recognition but are given the choice of self-editing or
back-end transcription. Approximately half of the radiologists
choose to use self-editing 80% to 90% of the time, and about half
of the radiologists send their reports to back-end transcription.
Take steps to maximize efficiency. With speech
recognition, efficiency is determined by accuracy,
navigation, integration, and macro use. To maximize accuracy, it is
best to use a headset microphone. This will improve accuracy by
standardizing the distance and the position of the microphone in
relation to the mouth. With a hand-held microphone, approximately
half of the errors can be attributed to not holding the microphone
in the correct position.
Make corrections properly, either using the correction mode or,
in the case of some speech recognition systems, the vocabulary
editor. This is a way of training the system to learn each user’s
specific speech patterns.
Pay close attention to ambient noise. It may be helpful to have
the walls of the reading room covered in acoustic paneling and to
have acoustic tile or carpeting installed on the floor.
If ambient noise remains a problem, consider putting in a fan or
some device that generates “white noise.”
Speech recognition requires not just dictation but navigation
through the text and various screens of the speech recognition
system, as well as simultaneous navigation through the PACS. It is
important to consider how to make this process seamless through use
of a mouse, a microphone with programmable buttons, or some other
Integration is another critical factor in efficiency.
It is fairly easy to use a stand-alone speech recognition system.
It is much more difficult to achieve interoperability
between the speech recognition system and the PACS or RIS. The
interface with the RIS needs to be bidirectional, enabling the
accession number to pass from the RIS to the speech recognition
system, and in the other direction, for text and any other
information to pass back to the RIS.
Even simple integration algorithms can eliminate unnecessary
tasks. Without them, it is necessary to first open a case
in the PACS, then open dictation in speech recognition, and then
add demographic information. Integration enables distillation of
the workflow into simply viewing images, dictating, and
signing the report. The case is closed automatically, the next case
opens automatically, and the radiologist views the images,
dictates, and signs the report.
Macros and templates are essentially canned reports that the
radiologist can pull up anytime and modify. With speech
recognition, the use of macros magnifies the time
savings. Not only does it save time in dictation, it saves time in
Figure 3 shows the modification of a macro for a CT
scan of the abdomen and pelvis. The macro states: “Aorta is normal
in size.” To modify that text, the user verbally selects the
sentence and substitutes: “There is an aneurysm of the lower
abdominal aorta measuring 4.8 centimeter in diameter.” The user
also changes the impression from: “No significant
abnormality in CT scan of abdomen and pelvis” to “Abdominal aortic
aneurysm. No other significant abnormality….”
All of the radiologist’s edits are done verbally, without the
eyes ever leaving the images. Not only is this method much faster
than dictating an entire report, the radiologist has only 2 phrases
Effective use of speech recognition can yield major improvements
in report turnaround time. This technology is not always well
accepted by radiologists, however, in part because it can reduce
productivity, at least initially.
To encourage radiologists to adopt speech recognition, it is
essential to offer meaningful incentives. It is also helpful to
identify departmental champions who can generate excitement about
the new technology, to set a firm date for discontinuing
conventional transcription, and to maximize efficiency by
improving accuracy, streamlining navigation, integrating speech
recognition with the PACS and RIS, and taking advantage of macro