is a Professor of Radiology at Mount Sinai School of Medicine,
and the Director of Breast Imaging, The Mount Sinai Hospital, New
Screening mammography has been associated with a 50% reduction
in breast cancer deaths among 40- to 69-year-old women in 2 Swedish
Our goal should be to achieve, and even exceed, such benefits for
all women who are screened. While we strive to improve on current
mammographic technology through research into new imaging methods,
such as digital mammography, much can be achieved through optimal
application of current techniques through quality control. The
American College of Radiology Mammography Accreditation Program
(ACR MAP) and the Mammography Quality Standards Act (MQSA)
represent successful efforts in this direction. Initiatives such as
educational and self-assessment programs that raise interpretive
skills are equally important. However, there is evidence that some
potentially detectable breast cancers may not always be detected
even when high-quality images are interpreted by experienced
What are the mammographic characteristics of missed
Missed cancers are cancers that were potentially detectable at
screening but were missed by the radiologist. Birdwell et al
used the term "missed cancers" for those that were appreciated by
the majority of radiologists on blinded retrospective review of
prior mammograms. Among the missed cancers in their study, 51% were
in breasts that were fatty or contained scattered fibroglandular
densities, 30% were seen as calcifications alone, 21% as a mass
with calcifications, and 47% as a noncalcified mass. Fifty-four
percent of the masses were 11 mm or larger in size and 54% of the
calcification cases were in areas larger than 11 mm. These
characteristics suggest that many missed cancers could have been
detected by improved interpretation and/or computer-aided detection
How often are breast cancers missed at screening due to
Several studies have sought to determine how frequently
nonpalpable cancers detected at screening mammography can be
identified in retrospect on a prior mammogram. Harvey et al
at the University of Arizona evaluated previous mammograms in 73
patients in whom nonpalpable breast cancers were detected on
subsequent mammograms. Reviews were performed two ways: 1) blinded
(without knowledge that cancer had been detected on a later
examination); and 2) nonblinded (side-by-side comparison of earlier
and later studies).
Blinded reviews were categorized as positive if biopsy was
recommended or if additional views were requested of the area where
the cancer was finally detected. On such reviews, the
interpretation was positive in 41% (30/73) of patients. Because it
is unknown whether additional views would have led to a biopsy
recommendation, it is possible that this classification may have
overestimated the number of cancers that were missed due to
observer error. Additionally, when blinded reviews do not mix
cancer cases with enough normal and benign cases, observers may be
more suspicious about mammographic findings than they would be
under everyday circumstances.
A subsequent nonblinded retrospective review found evidence of
cancer in 25 of the 43 patients for whom blinded review had been
negative. Because nonblinded reviews give observers the advantage
of hindsight, such studies may overestimate the number of cancers
that are prospectively identifiable even by the best readers.
Nevertheless, 75% (55/73) of cancers were either positive on
blinded or nonblinded review of previous mammograms. A similar
nonblinded review by van Dijck et al
found that 57% (25/44) of screen-detected breast cancers and 46%
(18/40) of interval cancers from a program in Holland could be
identified retrospectively on a previous mammogram.
Retrospective reviews where readers are blinded and where cancer
cases are mixed with an adequate number of normal and benign cases
provide more realistic estimates. One such review of breast cancers
missed during routine screening in North Carolina was conducted by
Yankaskas et al.
Four community-based radiologists experienced in mammography
performed independent, blinded, retrospective reviews of the
screening mammograms of 339 asymptomatic women. These included 93
women who developed breast cancer within 1 year of a negative
screening mammogram and 246 women in whom no breast cancer
developed during that year. Using the majority interpretation of
the 4 radiologists, the authors found that 42% of the 93
false-negative mammograms would have been worked up while the
average work-up rate for the 246 true-negative mammograms was 13%.
The authors subtracted the true-negative rate from the
false-negative rate to estimate that 29% of false-negative
mammograms could have been detected at screening. Similar results
were obtained in a blinded retrospective study by Vitak et al
in Sweden in which 2 external reviewers identified 25% of missed
cancers for further work-up.
Warren Burhenne et al
found that among 427 breast cancers detected by screening
mammography at 13 facilities in the United States, 67% (286/427)
were visible at nonblinded retrospective review of prior
mammograms. At blinded retrospective assessment of these prior
mammograms, panels of 5 radiologists independently reviewing these
cases enriched with normal cases found that 27% (115/427) would
have required biopsy or additional imaging.
In summary, 4 separate studies involving blinded retrospective
reviews of screening mammograms where breast cancer subsequently
developed found lesions requiring work-up or biopsy in 25% to 41%
of cases. Three nonblinded retrospective reviews identified 57% to
75% of missed cases (Table 1).
How may CAD affect detection rates and stage at
Warren Burhenne et al
used missed cancer cases to evaluate the potential of CAD to
increase mammographic detection rates. Among 115 breast cancers
identified on blinded retrospective review of mammograms performed
at least 9 months prior to the actual rate of detection, the
authors found that CAD detected 77% (89/115).
A study by Thurfjell et al
suggested that even some experienced radiologists may benefit from
CAD. Three radiologists interpreted 120 mammographic examinations
from the first screening round in Uppsala, Sweden. Among these 120
cases, 32 cancers had been detected at the first screening round,
10 cancers surfaced clinically during the subsequent interval
between screens, and 32 cancers were not detected until the second
screening round. Forty-six cases were normal at both screening
rounds. Thus, the material contained a wide range from obvious
cases to those that were very subtle and escaped detection at the
first screening round. The CAD system correctly marked 37 cancers,
30 of the 32 cancers that had been detected in the first screening
round and 7 of the 32 cancers that were not detected until the
second screening round.
On retrospective review of these 120 cases, one radiologist, an
expert screener with 30 years' experience in mammography including
15 years in mass screening, detected 44 cancers without the aid of
CAD. These included all 37 cancers that were marked by CAD. The
second radiologist, with 5 years' experience in mammography
including 2 years performing mass screening, detected 42 cancers
alone and 43 cases when aided by CAD. The third radiologist had 7
years of experience in mammography. She detected 35 cases without
CAD and 38 cancers with CAD prompting. Thus, the additive value of
CAD seemed to vary among radiologists and to depend on their
experience and skill.
Based on results from these retrospective studies of Warren
Burhenne et al
and Thurfjell et al,
clinical evaluation of CAD has now progressed to prospective
studies. Freer and Ulissey
performed prospective interpretation of 12,860 mammograms over a
1-year period using the ImageChecker M1000 version 2.0 (R2
Technology, Inc., Los Altos, CA) at their private practice office
in Texas. Among these screening studies, 3437 (27%) represented a
baseline evaluation while the remaining 9423 (73%) had a previous
mammogram available for comparison. Each mammogram was read twice
by a radiologist, first without CAD and then after CAD input. Due
to CAD, the detection rate increased from 3.2% (41/12,860) to 3.8%
(49/12,860). There was a 19.5% increase in the number of cancers
detected while the proportion of detected malignancies that were
early stage (0 and 1) rose from 73% to 78%.
What is the current detection sensitivity of CAD for
masses and calcifications?
The ability of CAD to identify breast calcifications and masses
has been evaluated in multiple clinical studies. Table 2 lists the
materials, methods, and results for the most recent evaluations.
For each study, the sensitivity of CAD is always better for
calcifications than for masses. In the study of Birdwell et al,
CAD had a higher sensitivity for masses containing calcifications
than for noncalcified masses, i.e., 83% (20/24) versus 67% (36/54),
yielding an average 71% (56/78) sensitivity for all masses.
Some investigators, such as Nakahara,
used CAD to perform a retrospective review of cancers that were
detected by radiologists in Japan. Other investigators, such as te
utilized retrospective review of cancers that were missed by
radiologists at screening in the Netherlands. Because missed
cancers are more subtle, CAD may perform less well.
Comparison of results from Warren Burhenne et al
and Birdwell et al
illustrate that CAD sensitivity depends on how missed cancers are
defined. Both investigators obtained their missed cancers from the
same pool of mammograms. Burhenne and colleagues
included missed cancers identified on
retrospective review of the prior studies, whereas Birdwell and
included only cancers identified on
retrospective review of the prior studies. As such, Burhenne
evaluated a larger number of missed cancers (286 vs. 115) and found
a lower detection sensitivity of CAD for calcifications (79% vs.
86%) and masses (48% vs. 71%) than did Birdwell.
In the study by Freer and Ulissey,
the radiologists first interpreted screening mammograms without the
benefit of CAD and then re-read each case with CAD prompting.
Computer-aided detection detected all 22 cancers seen as
calcifications. Among these 22 cancers, radiologists initially
detected 15 and then an additional 7 after CAD prompting. Among 27
cancers that appeared as masses, the radiologists originally
detected 26 without help from CAD and corroborated an additional
case after CAD review. Computer-aided detection detected 18
malignant masses and missed 9. Although CAD led to improved cancer
detection rates, prospective CAD sensitivity was 67% for masses
versus 100% for malignant calcifications.
A study by Vyborny et al
assessed the effect of spiculation on CAD. Malignant masses were
labeled as either spiculated or nonspiculated by 3 radiologists
separately. Masses considered spiculated by 0, 1, 2, and all 3
radiologists were termed "not spiculated," "possibly spiculated,"
"spiculated," and "clearly spiculated," respectively. Among 677
malignant masses, 14% (92/677) were considered not spiculated,
12.8% (87/677) were possibly spiculated, 18.2% (123/677) were
spiculated, and 55.4% (375/677) were clearly spiculated. CAD
results were then compared with radiologist ratings for
spiculation. The CAD system marked 86% (322/375) of clearly
spiculated masses, 72% (89/123) of spiculated masses, 61% (53/87)
of possibly spiculated masses, and 53% (49/92) of masses that were
In summary, 74% (498/677) of breast masses were considered
spiculated and clearly spiculated and 26% (179/677) were considered
not spiculated or possibly spiculated by at least 2 of the 3
radiologists. When these groupings were used, CAD marked 82.5%
(411/498) of masses in the combined spiculated and highly
spiculated categories and 57% (102/179) of masses classified as
either possibly spiculated or nonspiculated.
Vyborny et al
also assessed these malignant masses for their subtlety apart from
spiculation or nonspiculation. Among the malignant masses, 13%
(88/677) were considered subtle, 23% (154/677) were moderately well
visualized and 64% (435/677) were obvious. Among these 3 groups,
CAD marked 45% (40/88), 61% (94/154), and 87% (379/435),
respectively. Thus, sensitivity of CAD was highest for masses that
were obvious and lowest for masses that were subtle. The authors
also evaluated the effect of breast density on CAD. However, CAD
performance appeared to be independent of density.
In summary, detection sensitivity of CAD is highest for
calcifications and lowest for masses, especially those in which
spiculation is least evident and when the mass appears subtle to
How does CAD affect screening recall rates, follow-up
recommendations, and biopsy results?
Successful clinical application of CAD to screening mammography
requires increased detection rates without any excessive increase
in interpretation time, recall rates, and false-positive biopsies.
Current detection algorithms are heavily weighted toward
sensitivity, thereby sacrificing the specificity of any
computer-generated mark on the mammogram. Based on
observational-judgmental skills that no computer can yet duplicate,
the radiologist can act on or ignore any marks that the computer
has made to indicate possible masses or calcifications.
In their separate studies of CAD, Freer and Ulissey
and Birdwell et al
found that the computer made 2.8 and 2.9 marks, respectively, per
4-view screening mammogram on locations that were not cancer. Of
the marks made by the computer in the Freer and Ulissey study,
97.4% were dismissed without the need for additional mammographic
views. Although no study has yet addressed the potential effect of
CAD on interpretation time, it would seem that any effect would be
Recall rates refer to the percent of patients asked to return
for additional imaging work-up after batch interpretation of their
screening mammogram. Recall rates that are too high result in
patient inconvenience and anxiety as well as increased cost and
inefficiency of the screening process. Excessive recall rates
represent a disincentive for clinicians to advise screening,
patients to undergo screening, radiologists to perform screening,
and medical care payors to support screening.
If, however, recall rates are too low, some subtle cancers may
be missed and benign lesions may undergo unnecessary biopsy when
supplementary views and ultrasound are not performed to provide
more definitive evaluation of findings detected at screening.
The American College of Radiology (ACR) recommends that the
screening recall rate be <10%.
Due to availability of previous films for comparison, recall rates
for periodic screening can be lower than those for initial
Yankaskas et al
estimated that a recall rate of 4.9% to 5.5% represents the best
trade-off between sensitivity and positive predictive value.
Two clinical studies suggest that the effect of CAD on recall
rates is very slight. Warren Burhenne et al
calculated the recall rates of 14 radiologists at 5 facilities over
a 4-month minimum period prior to CAD. Their average recall rate
for screening studies was 8.3% (1961/23,682). The same radiologists
with the aid of CAD for a 4-month minimum period after installation
had a recall rate of 7.6% (1126/14,817).
In the study by Freer and Ulissey,
each of 12,860 screening mammograms was first interpreted without
CAD and then immediately after CAD. Average recall rates for these
readings was 6.5% (830/12,860) and 7.7% (986/12,860), respectively.
They also found a slight increase in the number of patients placed
in BI-RADS category 3, who were asked to return for short interval
follow-up. The proportion of patients placed in the "probably
benign" category was 1.9% (257/12,860) prior to CAD and 2.3%
(298/12,860) when studies were interpreted with CAD.
The positive predictive value (PPV) is commonly defined as the
percentage of biopsies performed as a result of a positive
mammographic examination that resulted in a diagnosis of cancer. A
PPV that is too low indicates an excessive rate of false positive
biopsies. A PPV that is too high suggests that some cancers having
an atypical appearance for malignancy are being ignored. An
appropriate value for PPV will depend on factors such as age,
breast cancer risk, and clinical signs and symptoms, which vary
from one practice population to another. Therefore, the ACR
recommends a PPV in the 25% to 40% range.
Freer and Ulissey
found that CAD had no effect on the PPV of 38% at their center.
Although more studies are needed for confirmation, initial
investigations indicate that CAD can increase screening detection
rates with no undue effect on rates for callback, short-term
follow-up, or PPV. *