Speech recognition (SR) systems are now a practical method for the creation of radiology reports and offer both accuracy and user-friendliness. This article compares traditional report creation with the SR method, discusses the benefits and problems associated with the use of SR, and provides recommendations for using an SR system.
Dr. Herman
is an Associate Professor, Toronto General Hospital, University
Health Network, Toronto, Ontario, Canada. Dr. Herman is also the
Chief Medical Officer of Merge eFilm, Milwaukee, WI.
Portions of the material in this article were presented by
Dr. Herman in "Introduction to Speech Recognition" at the 2003
SCAR meeting, Boston, MA.
In recent years, speech recognition (SR) systems have advanced
to the point that they are now a practical method of creating
radiology reports. More and more departments are beginning to use
this technology, and it is increasing in prominence at trade shows,
as both a seminar topic as well as a product being marketed.
This article will compare two report creation methods: the
traditional method and the SR method. It will also discuss the
benefits and problems associated with the use of SR and provide
recommendations for using an SR system. The use of SR addressed in
this article is the conversion of speech into text, as opposed to
the use of SR to control computer applications.
Creation of reports
Traditional method
Traditionally, the process of report creation begins when the
radiologist dictates the case, creating an audio report (Figure 1).
This is passed on to the transcriptionist, who types the dictated
material, creating a preliminary report. Next, the preliminary
report is reviewed by the radiologist, who may or may not edit it,
and who then accepts the report, which produces the final report
that is available for clinicians to review. It is well known that
there are often delays-sometimes of several days-from the time the
radiologist dictates the case to the time that it gets transcribed.
To compensate for this delay, some departments make the audio
report available to the clinician. Often, there is also a
significant delay between the time that the transcriptionist types
the report and the time that the radiologist reviews and accepts
it. To compensate for this, many departments allow clinicians to
review the preliminary report.
Therefore, the standard method of creating reports is associated
with some problems. In addition to the delays in getting the final
report, the radiologist may not remember the details of the case
when reviewing the preliminary report. Therefore, he or she may
need to re-review the images, which may mean pulling them from the
film library or recalling them on the picture archiving and
communication system (PACS). Also, many radiologists just review
the report to check for grammatical and spelling errors. In
addition, in some instances, the radiologist actually reports the
case twice-the first time creating a quick "verbal" report and
later creating what will become the final report (Figure 1).
Finally, the traditional method of report creation requires that
the radiology department use a typing pool composed of either
employees of the department or an outsourced service.
Report creation with speech recognition
With SR, the radiologist can dictate the case, edit it (if
necessary), and accept it all at once, which makes the final report
available almost immediately (Figure 2). Therefore, the clinician
can review this report sooner than would have been possible in the
traditional reporting method. This can potentially lead to better
patient care, since the patient can also move on to the next step
in their workup or begin treatment sooner. Also, it can lead to a
more accurate report because it is completed while the radiologist
is reviewing the images, before the details of the case might be
forgotten. In addition, there is less chance of the report getting
lost in the system. Finally, this process can lead to a more
satisfied referring physician, since the report is available more
quickly, as well as a more satisfied radiologist, since there is a
sense of completion in knowing that the report will not have to be
reviewed again.
Benefits of speech recognition
Some of the many benefits of SR have been mentioned above. The
two primary benefits are completion of reports more quickly, and
reduction in the number of transcriptionists required by the
department. These benefits will be described in more detail below.
Another benefit is the virtual elimination of calls for preliminary
reports.
1
In addition, radiology staff should spend less time looking for
film, since the referring physician will have the report more
quickly. Finally, SR allows the radiologist more control over the
dictation process: eg, because the radiologist doesn't rely on a
transcriptionist and doesn't need to create a separate "verbal"
report before completing a "final" report later.
Rapid report creation
A number of studies have been performed to assess how much more
quickly reports are completed using SR (Table 1). One study found
that the mean report turnaround time (ie, from examination
completion to report transcription) improved from 87.8 to 43.6
hours.
1
The researchers noted that report availability at 24 hours
increased from 10.5% to 62.5%. In another study, report creation
time fell from approximately 2 to 4 hours with transcriptionists to
<5 minutes with SR.
2
The researchers noted that with time, as the users adapted to the
system and vice versa, this time actually fell to <3.5 minutes.
Also, they stated that this became more of an issue with the use of
PACS, since images were now available for clinicians in 5 to 15
minutes; and since they wanted to include a report with the study,
this rapid report availability was mandatory. A third study
reported a 10-fold improvement in report turnaround times.
3
Another study investigated the use of these systems in a
teaching hospital, where cases are first dictated by residents and
then passed to the staff radiologist for final acceptance.
4
The researchers noted that when attending physicians dictated
studies themselves, 65% of reports were completed in <15 minutes
and 90% in <1 hour. With the traditional use of
transcriptionists, the mean report turnaround time was 30 hours.
However, when residents reported studies, the final report was
available in <1 hour only 15% of the time; 90% of reports were
available within 5 hours. Clearly, the delay between the time that
the resident completed the report and the time that the attending
radiologist signed off on it was significant.
It should be noted that the rapid report completion with SR has
also been documented outside of radiology. For example, a study of
reporting in the emergency department noted a report creation time
improvement from a mean of 39.6 minutes with transcriptionists to
3.65 minutes with SR.
5
Free transcriptionists
Speech recognition systems appear to reduce the number of
transcriptionists used by the department. This fact has been
documented by two studies that showed that departments do save
money after implementing these systems. The first study describes
savings of $100,000.
6
The second study stated that their department saved $1.7 million in
the first 5 years.
7
Problems with speech recognition
There are two main problems associated with the use of SR
systems: accuracy is lower than that of transcriptionists, and
radiologists spend more time creating reports. These two issues
will be discussed below.
Accuracy
The accuracy of SR systems has been addressed in many studies.
These reveal, in general, accuracies in the 90% to 100% range.
Analyzing all dictated words, three studies found accuracies of 93%
to 97%,
1
95% to 100%,
4
and approximately 90%,
8
respectively.
In a study involving multiple speakers of different
nationalities, it was noted that the accuracy rate of native
English speakers (90.3%) was slightly higher than that of
non-native English speakers (88.4%).
8
There were no gender differences in accuracy rates, nor were there
any differences among the various imaging modalities.
The authors further analyzed the errors that the system made
according to different criteria. They separated errors that were
clinically significant from those that were not. For example, if
the speaker dictated
femur
but the system typed
finger
, this was clinically significant. If the speaker dictated
a
but the system typed
an
, this was not. Also, they specifically looked for clinically
significant errors that tended to be difficult to detect. For
example, if the speaker dictated "There was no evidence of a
pneumothorax
,"
but the system typed "There was evidence of a pneumothorax,
"
this was called a significant subtle error. They noted an overall
error rate of 10.3% (ie, accuracy of approximately 90%). The
clinically significant error rate was 7.8%, and the significant
subtle error rate was 1.2%.
Again, similar findings have been noted outside of radiology.
For example, in an emergency department study, accuracy rate of SR
reporting was found to be 98.5%, compared with 99.7% using
transcriptionists.
5
They noted that they were making 2.5 corrections per chart with SR
reporting, as opposed to 1.2 corrections with the use of
transcriptionists.
Radiologist time
A major negative aspect of SR systems is the fact that some work
traditionally done by transcriptionists has been shifted to the
radiologist. There are a number of reasons why this is considered a
negative aspect. Radiologists don't want to be editors. Since SR
systems are generally less accurate than transcriptionists, more
editing needs to be done using SR systems. Also, radiologists need
to read their reports more meticulously than usual, especially
considering the subtle mistakes that occur (as noted above).
Traditionally, many radiologists read only certain sections
carefully (eg, the impression), but when using SR, they must read
the entire report. Finally, when dictating, radiologists need to be
careful about so-called dysfluencies (for example, stammering or
slurring, or speaking such sounds as
um
or
uh).
1
Some of the newer SR systems can be trained to ignore the latter
sounds, however.
Some authors noted that the above points are actually more
significant than they first appear.
9
For example, they imply that since the radiologist will now spend
more time editing reports than previously, they will spend
relatively less time looking at images. In addition, there is a
subtle change in focus in the radiologist's mind from image
interpretation to thinking about how the SR system performed.
9
Two studies have investigated the specific increase in the time
the radiologist spends creating reports. In one study, the average
report creation time was 74 seconds using transcriptionists but
increased to 162 seconds with SR.
6
It was remarked that this increase in time led to a loss of staff
morale. However, it appeared that the SR system used in this study
had many problems with it and, therefore, these results are likely
not predictive of what can be expected from current systems. For
example, they described that there were many system crashes that
required the user to reboot the computer, that it took a long time
for the system to save files, and that many words required
individual training. These delays were factored in to the
calculated SR time.
In another study, dictation time increased from 180 seconds to
203 seconds using SR (Table 2).
10
Editing time increased from 146 seconds to 176 seconds. Therefore,
total report creation time increased from 326 seconds using
transcriptionists to 379 seconds using SR (Figure 2).
Presumably to compensate for this increased time, some authors
have noted that radiologists tended to shorten their reports.
4
In one study, the mean report length was noted to decrease from 95
to 60 words when using SR.
1
It is important to keep in mind the perspective of others with
respect to this point. Clearly, radiologists are concerned about
this extra work. However, radiology administrators may not see the
problem in the same way. For example, they may believe that this
shifting of the editing function to earlier in the process is "an
efficient reallocation of total work rather than additional work."
11
Recommendations
Based on the experience of the many departments using SR, these
systems are definitely usable now and worthy of consideration by
almost any department. Although this article will not address costs
of the systems, anyone considering a purchase should perform a
cost/benefit analysis for a specific department.
In the planning stages, it is important to include all of the
pertinent stakeholders, including: radiology business managers,
information technology personnel, and representative radiologists.
It is important that the department has a strong chief and that he
or she is a clear believer in the benefits of using SR. The
technical aspects of the system must be optimized. For example, the
network bandwidth must be adequate and the PCs must meet the
requirements of the system vendor. There must be integration with
the department radiology information system and ideally with the
PACS as well.
The SR implementation will be much more likely to be successful
if the users (ie, the radiologists) are motivated to make it a
success. In this regard, it will help if they perceive that they
are being rewarded in some way for using it. For example,
radiologists may feel rewarded if their business saves money
(depending, of course, on their financial arrangement with the
department). They may feel a sense of satisfaction if they are
clearly shown that they are providing better service to their
referring physicians. The radiologists need to be reminded of the
new sense of completion they now have after dictating and accepting
the report in one step, knowing they will not need to see the
report again.
It would be helpful if implementation could begin in an area of
the department where specific individuals with positive attitudes
work. This would increase the chances of a successful rollout.
Also, these radiologists would become "champions" for the rest of
the department. Each user needs to be provided with adequate
training in the use of the SR system. Even more important, adequate
support must be available at a moment's notice. In addition, each
radiologist should enroll in the system. Enrollment refers to the
user training the system to his or her specific voice, prior to
using the system in clinical practice. This will improve
recognition accuracy, sometimes very significantly, and will
improve overall user satisfaction. Users should be trained to speak
approximately 10% more slowly than they normally do, as this can
improve the recognition rate. Also, each user should be shown how
to use macros and templates efficiently to improve report creation
time.
When starting to use the system, it is important that
radiologists not be in a stressful situation so they can take their
time getting familiar with it. Therefore, it is highly recommended
that they be relieved of their normal clinical duties during their
first day or two of use. For example, if they normally would be
expected to dictate 20 CT reports in the morning, they should give
15 of these to colleagues and have only 5 to report themselves.
This would allow them to take their time and not feel the pressure
to complete their studies.
Finally, when the system is rolled out, it must be made clear to
the radiologists that there is no going back to transcriptionists.
Otherwise, they will not dedicate themselves to making the system
work and will never become familiar enough with it to gain the
required sense of comfort.
Conclusion
Current speech recognition systems are now viable options for
the majority of users, in terms of both accuracy and
user-friendliness. The benefits of improved report turnaround time
and cost savings must be weighed against the increased time
radiologists must spend editing reports.