Dr. Shrestha is the Vice President of Medical Information
Technology, University of Pittsburgh Medical Center, Pittsburgh, PA, and
the Medical Director, Interoperability & Imaging Informatics,
Radiologists struggled and had fun with voice recognition since long before Siri was born.1
While it’s great that voice recognition seems to be in the mainstream
now with it making its way to your car and smartphone, in addition to
the call centers and help-desks, radiology vendors are starting to
leverage the technology to really catapult the radiology workflow to the
next level. The reality of clinical documentation today is that
documentation and radiology reporting are more than just about decreased
report turnaround time and transcription cost savings.
recognition technology, used the right way, has the potential to really
streamline the radiology workflow and make the readflow process more
intelligent, meaningful, and accurate. The goal in many instances is to
create a cohesive radiology workflow that is more meaningful.
Consumer driven acceptance
has been a tremendous year for voice recognition in general, and
healthcare in particular. We are seeing a rapid adoption of applications
for both smartphones and tablets, as well for other everyday devices,
such as vehicles that have embraced voice recognition in interesting
ways. Over the years, we have seen several companies, including Google,
IBM, and Microsoft, develop their own voice recognition technologies.
But it was really when Apple infused their marketing wizardry to Siri
that things started to take off amongst general consumers. Some see Siri
as an entertaining foray into mainstream adoption of voice recognition
in the mobile arena. We have started to see a wave of voice-enabled
technologies coming into our cars, tempting us to use voice as a primary
interface to find the nearest gas pump or to switch audio playlists.
This summer, Nuance introduced Nina,2
a collection of voice-enabled personal assistant technologies that
brings voice biometrics, speech recognition, and natural language
recognition to the masses. Already, we are starting to see these
technologies take us beyond simple speech commands to more natural
‘conversations’ with contextual dialogue. It seems clear that voice
recognition is now being led by a new wave of consumer-driven
We are starting
to see a healthy penetration of voice recognition technology into our
workflow in healthcare, and we seem to just be at the beginning of a
revolution in clinical document capture and enhanced clinical efficiency
enabled by speech. Many physicians have been toying around with
out-of-the-box voice recognition software, such as Nuance’s Dragon
software, and its more expensive medical edition, to voice enable
perhaps a more convenient documentation process on top of their regular
electronic medical record (EMR) systems. We are seeing a positive move
towards tighter integration with the EMRs, beyond traditional
speech-to-text dictation for SOAP (subjective, objective, assessment,
plan) notes and basic voice commands to drive specific functions and
templates off of the EMR.
Montrue Technologies (Ashland, OR)
recently used Nuance’s mobile medical software development kit (SDK) and
developed an iPad application that lets physicians dictate notes in a
remarkably accurate manner.3
Radiology has been an
early adopter of voice recognition in healthcare, albeit sometimes
reluctantly. We had started to see the potential to use speech-to-text
for our reporting needs even back in the 1990’s.4
initial impetus was to reduce report turnaround time (TAT), and some
reported dramatic report turnaround time improvements from days and
hours to minutes.
Continued implementation and greater acceptance
of voice recognition technology in radiology then was driven more by
cost equations. These technologies eliminated or dramatically decreased
transcription costs. But clearly, this in itself was (or is) not a
sustainable reason to switch from regular transcription services to
We started seeing academic institutions adopt voice
recognition in their reporting workflow, with private practice radiology
groups either completely embracing voice or choosing not to touch it at
all. Many initially saw the push for voice recognition in radiology
reporting as a cost shifting exercise that did not make sense, with
expensive radiologists playing the role of transcriptionists.5
focus driving further adoption soon became the quality of the reports
being generated. Initial voice recognition accuracy rates were low
compared to today’s rates. The need to redictate words, dates, phrases
or entire paragraphs was frustrating. Many radiologists gave up on voice
recognition software and opted instead to send all dictations to
traditional transcriptionists. This hybrid workflow still has widespread
However, the accuracy rates of voice recognition
technologies have continued to improve over the last decade. In some
instances, we are seeing exponential improvements yearly—and this is
most evident when the entire speech enablement workflow is taken into
consideration, including dramatic improvements in the quality of the
speech microphones, background noise reduction algorithms, faster
processing capabilities enabled through cloud-based delivery, as well as
natural language processing and related technologies that better
comprehend the incredible variety of sounds and medical jargon, in
widely varying accents and context.
Beyond mere typing with your tongue
the initial focus of voice recognition applications was primarily
around speech-to-text enablement, the current wave of adoption is being
driven primarily by intelligence built around the speech driven input.
Whether driven by natural language processing technologies or more
rudimentary logic around templating and dictation macros, the drive is
to enable the clinician to be more efficient, and improve the overall
quality of the documents being generated.
Voice-enabled structured reporting6
addresses issues around variations in the content of the reports and
allows for a more comprehensible clinical communication. Mammography and
cardiology have been at the forefront in structured reporting for a
number of years. Use of Breast Imaging Reporting and Data System
(BI-RADS) categories in mammography reporting has reduced variability
and improved clarity of communication between the radiologists and
clinicians. Medical vocabulary and semantics can make or break the
natural language processing around creating the radiology report,
especially around structured reporting. With the maturity of RadLex,
which is a comprehensive radiology lexicon of radiology terms, the
process of creating meaningful structured reports can now be put on
steroids. RadLex unifies and supplements other lexicons and standards,
such as Systematized Nomenclature of medicine Clinical Terms (SNOMED-CT)
and Digital Imaging and Communications in Medicine (DICOM).
radiologists are organized initiatives promoting best practices in
radiology reporting. The Radiological Society of North America (RSNA)
has established a Radiology Reporting Committee (RSNA Radiology
Reporting Initiative), with sponsorship of a forum of radiologists,
imaging informatics experts and industry executives promoting the
improvement and adoption of standardized report templates, with over 100
best-practices templates currently available freely.
Back to front and center
is tremendous promise in further defining and leveraging the synergies
between voice recognition and neuro-linguistic programming (NLP)
technologies. Radiology and hospital information management (HIM)
divisions in the provider organizations have been using NLP-driven
backend applications to automate analysis of radiology reports and
review missed billing opportunities and report quality. However, we are
seeing enhanced NLP technologies now coming to the front of the
workflow, aiding clinicians in real-time as they create the clinical
document. This has the capability of dramatically improving the quality
and clinical accuracy of the note or report being generated and
streamlining efficiencies in the clinical workflow process. These
technologies allow for automated intelligent processes, such as
correlation of the recorded structured data items to histopathologic
Enabling better workflow
The end result of
every case that a radiologist interprets is, quite basically, a report.
Voice recognition and related technologies need to save radiologists
time when possible and aid in the workflow. Many radiologists, having
been exposed to scars from earlier iterations of voice recognition
technologies that were less than optimized to their workflow, are highly
sensitized to the introduction of any technology that could possibly
distract them from their core mission of caring for their patients and
interpreting the imaging studies to the best of their capabilities.7
The radiologist’s workflow, or readflow, is hence a critical
consideration in the development or implementation of any voice
recognition and related technologies.
Too many healthcare-related
applications are designed without consideration for important
parameters, such as user-centered design guidelines, usability,
automation, hand-eye coordination and radiologists’ flow in reading
studies and capturing the data within the report. Any time looking at
dropdowns, menu options, and onscreen streaming text transcribed from
voice equates to time away from core interpretation processes.
essential is related workflow such as Critical Test Results Management
(CTRM), which entails both the clinical needs around reporting
significant clinical findings and the regulatory needs around ensuring a
closed loop and complete communication around these needs.
The sum of all of its parts
have to allow for occasional (or constant!) interruptions to their
reading workflow for a variety of tasks, such as quick consults,
conversations with ordering physicians, discussions with technologists
A radiologist’s workspace consists not just of the
voice recognition reporting system, but also the picture archiving and
communication system (PACS), the radiology information system (RIS), and
other systems that may be used for 3-dimensional (3D) imaging and
advanced visualization as well as perhaps computer-aided detection
(CAD).8 Providers purchase one of these systems, and then
have to deal with the challenges of getting the integration between the
One of the single biggest things that could
happen in our imaging industry would be for the key PACS and 3D advanced
visualization vendors to work directly with the key voice recognition
vendors to streamline the workflow processes and integration challenges
through and through, without leaving the headaches to busy radiologists
and PACS administrators. This should be a defined process that happens
with every product version upgrade, on either side.
voice recognition and related technologies continue to make the strides
we are seeing in the industry, it would be prudent to address the needs
of the clinical workflow as a unified imaging workspace. Loose
interfaces between critical radiology applications should give way to
tighter integration and streamlined bidirectional coordination between
these traditionally disparate applications. It is important to realize
that as much value as any one of these systems may provide to a specific
set of needs around that one system, the reality of today’s radiology
environment is one that calls for a patient centric workflow around the
imaging study being interpreted. The coordination between the many
systems that contribute in one way or another to the creation of the
radiology report will then get the focus that it deserves.
have to allow the tremendous innovations we are seeing today around
voice recognition and natural language processing to enhance the
workflow of the radiologists, and consequently improve the quality of
both the radiology reports as well as the services being provided back
to the ordering clinicians.
- Apple. http://www.apple.com/iphone/features/siri.html. Accessed September 2, 2012.
Accessed September 2, 2012.
Phelps B. HIStalk Interviews Brian Phelps, CEO, Montrue Technologies.
Accessed September 12, 2012.
Mathie AG, Strickland NH. Interpretation of CT scans with PACS image display in stack mode. Radiology. 1977;203:207-209.
Pezzullo JA, Tung GA, Rogg JM. Voice recognition dictation: radiologist as transcriptionist. J Digit Imaging. 2008;21:384-389.
Weiss DL, Langlotz CP. Structured reporting: patient care enhancement or productivity nightmare? Radiology. 2008. 2008 Dec;249(3):739-47.
Langer SG. Radiology speech recogintion: Workflow, integration and productivity issues. Curr Prob Diagnostic Radiology. 2002;95-104.
Pavlicek W, Muhm JR, Collins JM, et al. Quality of service improvements
from coupling the digital chest unit with integrated speech
recognition, information, and PACS. J Digit Imaging. 1999 Nov;12:191-197.