The author describes his experience with speech recognition software and suggests ways in which other radiologists can integrate this still developing technology into their routine practices.
Dr. Tobin
is an Attending Radiologist at the Metropolitan Hospital Center,
New York, NY.
As a radiologist practicing in a busy inner-city hospital, I
would like to describe my experience with the technology of speech
recognition software and suggest ways in which other radiologists
can integrate this still developing technology into their routine
practices.
Speech recognition software automatically transcribes words
spoken through a microphone into computer-generated text. This
software has been advocated as a replacement for human
transcriptionists. However, I have found that the word-recognition
error rate associated with this technology results in a significant
additional amount of time that must be devoted to editing reports
manually.
This article is for the benefit of my fellow radiologists who
are destined, through the use of speech recognition software, to
become typists, proofreaders, editors, and word processors in
addition to less productive image interpreters. I created a
"(non)recognition dictionary" (Table 1) as a guide for those
clinicians who must now read radiology reports that were
computer-generated by my speaking into a microphone.
Things were not always so. Formerly, the radiologist would
dictate a report by telephone into a radiology information system
(RIS); subsequently, a typist would listen to the recording and
type up a report as a permanent record. As pointed out by
detractors of this method, a signed, written report could take 24
to 72 hours to generate. The fact is, however, that once dictated,
results would immediately be available to clinicians by telephone
and, of course, the radiologist interpreting the images would be
available as well.
Now, the typists are gone and the radiologist must function as
transcriber and proofreader, without delaying image interpretation.
If the speech recognition software were able to transcribe a
radiologist's reports as well as a typist, all would be well. But
in my experience, this has not been the case. Unlike a typist who
becomes accustomed to my voice and the words I use, I have found
that speech recognition does not.
Further, correcting these software-generated reports is not a
trivial process. This is because speech recognition errors are not
misspellings, which are easy to detect, but rather, inappropriate
word substitutions, which can be surprisingly difficult to
identify, even during careful re-reading. These mistakes of word
substitution are often amusing, as seen in Table 1
,
but if left uncorrected these mistakes can change the meaning of a
report.
Using speech recognition
When you start using speech recognition software, you first
"train" it to your voice by reading one or more selections offered
by the program. Later, if the program types a different word from
the one you said, you can --at least in theory--train the software
by typing the misspelled word and then pronouncing it clearly when
prompted by the computer.
As soon as I started working with speech recognition, I found
there were many words I used that the software failed to recognize.
Of course, the software didn't "know" that it "misunderstood" me,
and so it typed what it "thought" I said, no matter how ridiculous
or misleading.
The first thing I have to do is read my dictation very carefully
and try to identify misrecognized words. As my (non)recognition
dictionary shows, these incorrect words are often homonyms of the
dictated word (Table 1). I then attempt to correct these errors by
redictating each misunderstood word, paying careful attention to my
pronunciation, thus giving the system a second chance to get it
right. Sometimes, the software rises to the challenge and types the
correct word; but other times, it does not. For example, the speech
recognition software continually refuses to type the word
phalanges
, insisting instead on
phalanx east
or
phalanx cheese
. Similarly, no matter how carefully I pronounce the word
hypo
, the software types
high pole
.
Continually misinterpreted words, such as
phalanges
, require additional "training." I did this by invoking the
software's dictionary and recording the pronunciation of such words
next to their typed equivalents. However, with the software package
we use, changes made to the dictionary are not permanent until one
exits the program. Thus, if the program crashes during dictation,
all corrections made to the dictionary are lost and have to be
repeated.
Training the software to recognize my pronunciation is not
always successful. After multiple trainings and continual use, the
speech recognition software still insists on typing
PA
instead of ending a sentence with a
period
. I have also had no success teaching the program to type
ultrasound
; I now use the word
sonography
. In a real sense, the speech recognition software has trained me
to speak, rather than I having trained it.
Part of the problem, we are told, is that the currently
available speech recognition software is not "context-sensitive."
In other words, the software doesn't know which words make sense in
the contents of the radiology report. For example, if the speech
recognition software were more context-sensitive, then, presumably,
it would not type
Clinton
instead of
colon
in a report describing an abdominal radiograph of a patient filled
with abundant feces.
Transcriptionists would be unlikely to confuse
Clinton
and
colon
. Their errors are usually spelling misteaks...er, mistakes...,
with the meaning of the misspelled word evident to the reader. In
my experience, the best transcriptionists question possible
inconsistencies in a report. Human beings have the ability to make
judgments, whereas computer software does not.
Also, transcriptionists will not type words or sentences that
sound garbled to them, preferring to insert question marks. When
words "sound garbled" to speech recognition software, it will type
gibberish, which is harder to find and, therefore, easier to miss
during proofreading.
After more than a year of continuous use, I can report that,
because of erroneous word substitution, speech recognition slows
down my rate of dictations, thereby decreasing my productivity. I
would estimate my productivity loss is 20% to 25%, depending on the
imaging modality and the complexity of the report. Normal chest
radiographs, eg, are quick dictations that can be saved as
"normals" both by speech recognition and by transcriptionists.
Little time is gained or lost on these. Other examinations, such as
breast ultrasounds (I mean, sonograms), are more difficult for me
to dictate with speech recognition when there are multiple
cysts/solid areas in each breast, each of which I try to
characterize by size, shape, position, orientation, echotexture,
and the like.
Recommendations
Until speech recognition becomes more accurate, I suggest that
radiologists required to use it do the following:
1. Insist on being involved in planning and implementing
speech recognition.
We all feel better if we have some control over what happens to us.
The more we are included in the decision-making process, the less
likely we are to feel resentful. Having a good attitude about
speech recognition is important because, as I have stressed, the
technology is far from perfect.
2. Insist on the latest software.
The current implementation of speech recognition is "continuous,"
meaning that you can dictate reports at your normal rate of
talking. This technology replaces "discrete" speech software, which
required pausing between words. There are also bug fixes and small
upgrades that are offered from time to time.
3. Insist on sufficiently powerful hardware.
Current speech recognition software benefits from powerful
processors and large amounts of random access memory (RAM).
4. Insist on a reliable vendor who can install and maintain a
complex system.
Issues such as bandwidth and integration of
speech-recognitionÂgenerated reports with the RIS, the hospital
information system (HIS), and the picture archiving and
communication system (PACS) require specialized knowledge. It is
much more involved than installing speech recognition software on
your laptop at home. Also, you really need technical-support people
who know what to do when things go wrong.
5. Insist on training.
Speech recognition takes getting used to, and good training can
really jump-start a new user. People differ widely in their
capacity to adjust to speech recognition and some people need extra
help. No one should have to enter into this technology cold.
6. Insist on a quiet place to dictate.
I don't care what anyone has to say about this one. Noise, if loud
enough, will wind up on your computer screen as gibberish. As human
beings, we can focus on a single conversation amid a sea of noise
and other conversations. Software is much less discriminating.
Noise cancellation microphones are a real advance.
7. Compose your thoughts prior to dictating.
This will help to eliminate the hesitations, ie, the "errs" and
"ems," that are common in everyday speech. Put bluntly, you cannot
"think" into the microphone, which may be a major change for many
of us.
8. Keep your dictations concise.
The more you say, the more errors the software can make.
9. Create easily modifiable report templates, if your
software allows it.
This can be a real time-saver, similar to the "standard normals" we
used with the transcriptionists. The most satisfied users, in my
experience, are those who create a lot of macros to cover many
different normal and abnormal findings and then modify them
slightly for individual dictations. However, the number of macros
quickly rises as you cover more modalities and as your studies and
your patients become more complex. Then you are faced with multiple
searches for multiple macros, and the issues of naming and
organizing your macros become major. You'll see what I mean!
10. Finally, be realistic about the limitations of speech
recognition software.
It is a small miracle that it works at all.
Conclusion
Based on my experience as a radiologist and a long-time computer
user, I find speech recognition software, at this point in time, to
be error-prone, often leading to time-wasting, if humorous, word
substitutions, some of which change the meaning of my reports.
Proofreading reports generated by speech recognition is fatiguing,
can miss errors (leading to inaccurate reports), and distracts my
focus from image interpretation.
Correcting speech recognition errors and training the software
to recognize words is frustrating and tedious. It is an ongoing
process and not something that ends a week or two after initial
use. Some words require multiple training sessions, and other
words, despite my attempts at training, remain unrecognizable to my
software. Over time, I have learned which words and combination of
words to avoid. I have not yet reached the extreme of one of my
colleagues who has tried to reduce the frequency of nonrecognition
errors by laboriously deleting all the words in the application's
dictionary that he felt he was unlikely to use.
Because of current deficiencies in speech recognition software,
I am less productive in generating reports. Although using
standard, predictated reports (macros) and a shortened, more
concise style of dictation have been helpful, speech recognition
remains inefficient for me. Sometimes my frustration level with the
software becomes so high that I resort to typing part of my
reports, which is an option within the application.
Although touted as cost-saving as compared with a typing
service, the true cost of speech recognition (setting aside the
fixed cost of hardware, software, and networking) must include the
loss of radiologist productivity and the cost of additional
technical support required to keep the speech recognition system
functioning. Since our administration introduced PACS and speech
recognition, several computer specialists have been added to the
payroll and others have been re-assigned to application
support.
There are hidden costs as well because the speech recognition
technology competes for dollars with other capital projects. Some
radiology administrators have confided to me that they are anxious
to adopt speech recognition because they believe it will increase
their bottom line but are reluctant to make other needed capital
purchases that might not. The irony of producing reports in record
turn-around time of studies generated on technologically dated
imaging systems belies the argument that speech recognition is
being implemented primarily for patient care.
Speech recognition is the wave of the future because patient
records will, undoubtedly, become digital. So, all physicians, not
just radiologists, will be typing, dictating, or using some
combination of these and other digital input technologies, as the
electronic patient record replaces the thick multivolume patient
chart, which is sometimes difficult to locate and occasionally
challenging to decipher.
There are those who believe that radiologists will eventually
generate reports with a "structured reporting" system, in which
they select findings from a list of possibilities contained in a
standard lexicon, with the computer generating the report. There
are authors (eg, Langlotz) who claim that "structured reporting"
can be faster than either conventional dictation or speech
recognition. For more information on and insight into this subject,
you may refer to the list of suggested readings at the end of this
article.
In the early 1990s, I purchased a program called VoRecOne
(Impulse, Inc., Minneapolis, MN) for the Amiga computer. When I
found that the program had difficulty distinguishing
hello
from
help
, I decided that there were other, more promising applications to
try. Since that time, speech recognition technology has made
remarkable progress, and there is every hope that with faster
computers and more sophisticated algorithms, the technology will
become context-sensitive and the software will "know" that feet
have five
toes
and not five
towels
. That time has not yet arrived.
In the meantime, I hope you enjoyed the misinterpretations our
speech software has provided, and I encourage you to make your own
dictionary.
AR
suggested readings
Gale B, Safriel Y, Lukban A, et al. Radiology report production
times: Voice recognition vs. Transcription.
Radiol Manage
. 2001;23:18-22.
Hayt DB, Alexander S. The pros and cons of implementing PACS and
speech recognition systems.
J Digital Imaging
. 2001;14:149-157.
Heilman RS. Voice recognition transcription: Surely the future
but is it ready [editorial]?
RadioGraphics.
1999;19:2.
Klevans RL, Rodman RD.
Voice Recognition.
Norwood, MA: Artech House, Inc.; 1997.
Langlotz CP, Meininger L. Enhancing the expressiveness and
usability of structured image reporting systems.
Proc AMIA Symp
. 2000;467-471.
Ramaswarmy MR, Chaljub G, Esch O, et al. Continuous speech
recognition in MR imaging reporting: Advantages, disadvantages, and
impact.
AJR Am J Roentgenol
. 2000;174:617-622.
Rodman RD.
Computer Speech Technology.
Norwood, MA: Artech House, Inc.; 1999.