Clinical evaluation of computer-aided detection: Screening mammography


View content online at: http://www.appliedradiology.com/Issues/2001/11/Supplements/Clinical-evaluation-of-computer-aided-detection--Screening-mammography.aspx

Abstract:  Screening mammography has been associated with a 50% reduction in breast cancer deaths among 40-to-90 year-old women in 2 Swedish counites.1 Our goal should be to achieve, and even exceed, such benefits for all women who are screened.
Loading...

Dr. Feig is a Professor of Radiology at Mount Sinai School of Medicine, and the Director of Breast Imaging, The Mount Sinai Hospital, New York, NY.

Screening mammography has been associated with a 50% reduction in breast cancer deaths among 40- to 69-year-old women in 2 Swedish counties. 1 Our goal should be to achieve, and even exceed, such benefits for all women who are screened. While we strive to improve on current mammographic technology through research into new imaging methods, such as digital mammography, much can be achieved through optimal application of current techniques through quality control. The American College of Radiology Mammography Accreditation Program (ACR MAP) and the Mammography Quality Standards Act (MQSA) represent successful efforts in this direction. Initiatives such as educational and self-assessment programs that raise interpretive skills are equally important. However, there is evidence that some potentially detectable breast cancers may not always be detected even when high-quality images are interpreted by experienced radiologists.

What are the mammographic characteristics of missed cancers?

Missed cancers are cancers that were potentially detectable at screening but were missed by the radiologist. Birdwell et al 2 used the term "missed cancers" for those that were appreciated by the majority of radiologists on blinded retrospective review of prior mammograms. Among the missed cancers in their study, 51% were in breasts that were fatty or contained scattered fibroglandular densities, 30% were seen as calcifications alone, 21% as a mass with calcifications, and 47% as a noncalcified mass. Fifty-four percent of the masses were 11 mm or larger in size and 54% of the calcification cases were in areas larger than 11 mm. These characteristics suggest that many missed cancers could have been detected by improved interpretation and/or computer-aided detection (CAD).

How often are breast cancers missed at screening due to observer performance?

Several studies have sought to determine how frequently nonpalpable cancers detected at screening mammography can be identified in retrospect on a prior mammogram. Harvey et al 3 at the University of Arizona evaluated previous mammograms in 73 patients in whom nonpalpable breast cancers were detected on subsequent mammograms. Reviews were performed two ways: 1) blinded (without knowledge that cancer had been detected on a later examination); and 2) nonblinded (side-by-side comparison of earlier and later studies).

Blinded reviews were categorized as positive if biopsy was recommended or if additional views were requested of the area where the cancer was finally detected. On such reviews, the interpretation was positive in 41% (30/73) of patients. Because it is unknown whether additional views would have led to a biopsy recommendation, it is possible that this classification may have overestimated the number of cancers that were missed due to observer error. Additionally, when blinded reviews do not mix cancer cases with enough normal and benign cases, observers may be more suspicious about mammographic findings than they would be under everyday circumstances. 4

A subsequent nonblinded retrospective review found evidence of cancer in 25 of the 43 patients for whom blinded review had been negative. Because nonblinded reviews give observers the advantage of hindsight, such studies may overestimate the number of cancers that are prospectively identifiable even by the best readers. Nevertheless, 75% (55/73) of cancers were either positive on blinded or nonblinded review of previous mammograms. A similar nonblinded review by van Dijck et al 5 found that 57% (25/44) of screen-detected breast cancers and 46% (18/40) of interval cancers from a program in Holland could be identified retrospectively on a previous mammogram.

Retrospective reviews where readers are blinded and where cancer cases are mixed with an adequate number of normal and benign cases provide more realistic estimates. One such review of breast cancers missed during routine screening in North Carolina was conducted by Yankaskas et al. 6 Four community-based radiologists experienced in mammography performed independent, blinded, retrospective reviews of the screening mammograms of 339 asymptomatic women. These included 93 women who developed breast cancer within 1 year of a negative screening mammogram and 246 women in whom no breast cancer developed during that year. Using the majority interpretation of the 4 radiologists, the authors found that 42% of the 93 false-negative mammograms would have been worked up while the average work-up rate for the 246 true-negative mammograms was 13%. The authors subtracted the true-negative rate from the false-negative rate to estimate that 29% of false-negative mammograms could have been detected at screening. Similar results were obtained in a blinded retrospective study by Vitak et al 7 in Sweden in which 2 external reviewers identified 25% of missed cancers for further work-up.

Warren Burhenne et al 8 found that among 427 breast cancers detected by screening mammography at 13 facilities in the United States, 67% (286/427) were visible at nonblinded retrospective review of prior mammograms. At blinded retrospective assessment of these prior mammograms, panels of 5 radiologists independently reviewing these cases enriched with normal cases found that 27% (115/427) would have required biopsy or additional imaging.

In summary, 4 separate studies involving blinded retrospective reviews of screening mammograms where breast cancer subsequently developed found lesions requiring work-up or biopsy in 25% to 41% of cases. Three nonblinded retrospective reviews identified 57% to 75% of missed cases (Table 1).

How may CAD affect detection rates and stage at detection?

Warren Burhenne et al 8 used missed cancer cases to evaluate the potential of CAD to increase mammographic detection rates. Among 115 breast cancers identified on blinded retrospective review of mammograms performed at least 9 months prior to the actual rate of detection, the authors found that CAD detected 77% (89/115).

A study by Thurfjell et al 9 suggested that even some experienced radiologists may benefit from CAD. Three radiologists interpreted 120 mammographic examinations from the first screening round in Uppsala, Sweden. Among these 120 cases, 32 cancers had been detected at the first screening round, 10 cancers surfaced clinically during the subsequent interval between screens, and 32 cancers were not detected until the second screening round. Forty-six cases were normal at both screening rounds. Thus, the material contained a wide range from obvious cases to those that were very subtle and escaped detection at the first screening round. The CAD system correctly marked 37 cancers, 30 of the 32 cancers that had been detected in the first screening round and 7 of the 32 cancers that were not detected until the second screening round.

On retrospective review of these 120 cases, one radiologist, an expert screener with 30 years' experience in mammography including 15 years in mass screening, detected 44 cancers without the aid of CAD. These included all 37 cancers that were marked by CAD. The second radiologist, with 5 years' experience in mammography including 2 years performing mass screening, detected 42 cancers alone and 43 cases when aided by CAD. The third radiologist had 7 years of experience in mammography. She detected 35 cases without CAD and 38 cancers with CAD prompting. Thus, the additive value of CAD seemed to vary among radiologists and to depend on their experience and skill.

Based on results from these retrospective studies of Warren Burhenne et al 8 and Thurfjell et al, 9 clinical evaluation of CAD has now progressed to prospective studies. Freer and Ulissey 10 performed prospective interpretation of 12,860 mammograms over a 1-year period using the ImageChecker M1000 version 2.0 (R2 Technology, Inc., Los Altos, CA) at their private practice office in Texas. Among these screening studies, 3437 (27%) represented a baseline evaluation while the remaining 9423 (73%) had a previous mammogram available for comparison. Each mammogram was read twice by a radiologist, first without CAD and then after CAD input. Due to CAD, the detection rate increased from 3.2% (41/12,860) to 3.8% (49/12,860). There was a 19.5% increase in the number of cancers detected while the proportion of detected malignancies that were early stage (0 and 1) rose from 73% to 78%.

What is the current detection sensitivity of CAD for masses and calcifications?

The ability of CAD to identify breast calcifications and masses has been evaluated in multiple clinical studies. Table 2 lists the materials, methods, and results for the most recent evaluations. For each study, the sensitivity of CAD is always better for calcifications than for masses. In the study of Birdwell et al, 2 CAD had a higher sensitivity for masses containing calcifications than for noncalcified masses, i.e., 83% (20/24) versus 67% (36/54), yielding an average 71% (56/78) sensitivity for all masses.

Some investigators, such as Nakahara, 11 used CAD to perform a retrospective review of cancers that were detected by radiologists in Japan. Other investigators, such as te Brake 12 utilized retrospective review of cancers that were missed by radiologists at screening in the Netherlands. Because missed cancers are more subtle, CAD may perform less well.

Comparison of results from Warren Burhenne et al 8 and Birdwell et al 2 illustrate that CAD sensitivity depends on how missed cancers are defined. Both investigators obtained their missed cancers from the same pool of mammograms. Burhenne and colleagues 8 included missed cancers identified on nonblinded retrospective review of the prior studies, whereas Birdwell and coworkers 2 included only cancers identified on blinded
retrospective review of the prior studies. As such, Burhenne evaluated a larger number of missed cancers (286 vs. 115) and found a lower detection sensitivity of CAD for calcifications (79% vs. 86%) and masses (48% vs. 71%) than did Birdwell.

In the study by Freer and Ulissey, 10 the radiologists first interpreted screening mammograms without the benefit of CAD and then re-read each case with CAD prompting. Computer-aided detection detected all 22 cancers seen as calcifications. Among these 22 cancers, radiologists initially detected 15 and then an additional 7 after CAD prompting. Among 27 cancers that appeared as masses, the radiologists originally detected 26 without help from CAD and corroborated an additional case after CAD review. Computer-aided detection detected 18 malignant masses and missed 9. Although CAD led to improved cancer detection rates, prospective CAD sensitivity was 67% for masses versus 100% for malignant calcifications.

A study by Vyborny et al 13 assessed the effect of spiculation on CAD. Malignant masses were labeled as either spiculated or nonspiculated by 3 radiologists separately. Masses considered spiculated by 0, 1, 2, and all 3 radiologists were termed "not spiculated," "possibly spiculated," "spiculated," and "clearly spiculated," respectively. Among 677 malignant masses, 14% (92/677) were considered not spiculated, 12.8% (87/677) were possibly spiculated, 18.2% (123/677) were spiculated, and 55.4% (375/677) were clearly spiculated. CAD results were then compared with radiologist ratings for spiculation. The CAD system marked 86% (322/375) of clearly spiculated masses, 72% (89/123) of spiculated masses, 61% (53/87) of possibly spiculated masses, and 53% (49/92) of masses that were not spiculated.

In summary, 74% (498/677) of breast masses were considered spiculated and clearly spiculated and 26% (179/677) were considered not spiculated or possibly spiculated by at least 2 of the 3 radiologists. When these groupings were used, CAD marked 82.5% (411/498) of masses in the combined spiculated and highly spiculated categories and 57% (102/179) of masses classified as either possibly spiculated or nonspiculated.

Vyborny et al 13 also assessed these malignant masses for their subtlety apart from spiculation or nonspiculation. Among the malignant masses, 13% (88/677) were considered subtle, 23% (154/677) were moderately well visualized and 64% (435/677) were obvious. Among these 3 groups, CAD marked 45% (40/88), 61% (94/154), and 87% (379/435), respectively. Thus, sensitivity of CAD was highest for masses that were obvious and lowest for masses that were subtle. The authors also evaluated the effect of breast density on CAD. However, CAD performance appeared to be independent of density.

In summary, detection sensitivity of CAD is highest for calcifications and lowest for masses, especially those in which spiculation is least evident and when the mass appears subtle to radiologists.

How does CAD affect screening recall rates, follow-up recommendations, and biopsy results?

Successful clinical application of CAD to screening mammography requires increased detection rates without any excessive increase in interpretation time, recall rates, and false-positive biopsies. Current detection algorithms are heavily weighted toward sensitivity, thereby sacrificing the specificity of any computer-generated mark on the mammogram. Based on observational-judgmental skills that no computer can yet duplicate, the radiologist can act on or ignore any marks that the computer has made to indicate possible masses or calcifications.

In their separate studies of CAD, Freer and Ulissey 10 and Birdwell et al 2 found that the computer made 2.8 and 2.9 marks, respectively, per 4-view screening mammogram on locations that were not cancer. Of the marks made by the computer in the Freer and Ulissey study, 10 97.4% were dismissed without the need for additional mammographic views. Although no study has yet addressed the potential effect of CAD on interpretation time, it would seem that any effect would be relatively small.

Recall rates refer to the percent of patients asked to return for additional imaging work-up after batch interpretation of their screening mammogram. Recall rates that are too high result in patient inconvenience and anxiety as well as increased cost and inefficiency of the screening process. Excessive recall rates represent a disincentive for clinicians to advise screening, patients to undergo screening, radiologists to perform screening, and medical care payors to support screening.

If, however, recall rates are too low, some subtle cancers may be missed and benign lesions may undergo unnecessary biopsy when supplementary views and ultrasound are not performed to provide more definitive evaluation of findings detected at screening.

The American College of Radiology (ACR) recommends that the screening recall rate be <10%. 14 Due to availability of previous films for comparison, recall rates for periodic screening can be lower than those for initial screening. 15 Yankaskas et al 16 estimated that a recall rate of 4.9% to 5.5% represents the best trade-off between sensitivity and positive predictive value.

Two clinical studies suggest that the effect of CAD on recall rates is very slight. Warren Burhenne et al 8 calculated the recall rates of 14 radiologists at 5 facilities over a 4-month minimum period prior to CAD. Their average recall rate for screening studies was 8.3% (1961/23,682). The same radiologists with the aid of CAD for a 4-month minimum period after installation had a recall rate of 7.6% (1126/14,817).

In the study by Freer and Ulissey, 10 each of 12,860 screening mammograms was first interpreted without CAD and then immediately after CAD. Average recall rates for these readings was 6.5% (830/12,860) and 7.7% (986/12,860), respectively. They also found a slight increase in the number of patients placed in BI-RADS category 3, who were asked to return for short interval follow-up. The proportion of patients placed in the "probably benign" category was 1.9% (257/12,860) prior to CAD and 2.3% (298/12,860) when studies were interpreted with CAD.

The positive predictive value (PPV) is commonly defined as the percentage of biopsies performed as a result of a positive mammographic examination that resulted in a diagnosis of cancer. A PPV that is too low indicates an excessive rate of false positive biopsies. A PPV that is too high suggests that some cancers having an atypical appearance for malignancy are being ignored. An appropriate value for PPV will depend on factors such as age, breast cancer risk, and clinical signs and symptoms, which vary from one practice population to another. Therefore, the ACR recommends a PPV in the 25% to 40% range. 14 Freer and Ulissey 10 found that CAD had no effect on the PPV of 38% at their center.

Conclusion

Although more studies are needed for confirmation, initial investigations indicate that CAD can increase screening detection rates with no undue effect on rates for callback, short-term follow-up, or PPV. *