Hybrid AI-Radiologist Mammography Strategy Cuts Workload by 38% Without Sacrificing Accuracy

Published Date: August 19, 2025
By News Release

A large retrospective study has demonstrated that a hybrid reading approach combining radiologists and artificial intelligence (AI) can reduce radiologist workload in breast cancer screening by nearly 40%, while maintaining recall and cancer detection rates comparable to standard practice.

The research, published in Radiology by the Radiological Society of North America (RSNA), analyzed over 41,000 screening mammograms from more than 15,000 women enrolled in the Dutch National Breast Cancer Screening Program between 2003 and 2018. Results showed that radiologists’ workload could be reduced by 38% using a hybrid AI strategy without compromising diagnostic performance.

“Although the overall performance of state-of-the-art AI models is very high, AI sometimes makes mistakes,” explained study co-author Sarah D. Verboom, M.Sc., a doctoral candidate at Radboud University Medical Center. “Identifying exams in which AI interpretation is unreliable is crucial to allow for and optimize use of AI models in breast cancer screening programs.”

The proposed strategy relies on combining AI’s probability of malignancy (PoM) score with an uncertainty estimate. When AI confidently determines an exam as normal or confidently recommends recall, its decision is accepted. However, if the AI model is uncertain, the case is routed to radiologists for double reading.

The team tested different uncertainty measures, finding that the entropy of the mean PoM score produced cancer detection and recall rates nearly identical to radiologists’ double reading: 6.6 cancers per 1,000 exams and a recall rate of 23.7 per 1,000, respectively. For comparison, radiologists achieved 6.7 cancers and 23.9 recalls per 1,000 exams.

Although AI deferred most cases back to radiologists due to uncertainty, it classified 38% of exams as certain, reducing radiologist workload to 61.9% of normal. Importantly, when AI was confident, its performance improved: AUC reached 0.96 compared with 0.87 overall, and sensitivity closely matched radiologists (85.4% vs. 88.9%). Younger women with denser breasts were more likely to fall into the “uncertain” category requiring human review.

Verboom stressed that the real innovation lies in quantifying AI confidence. “The key component of our study isn’t necessarily that this is the best way to split the workload, but that it’s helpful to have uncertainty quantification built into AI models,” she said. “I hope commercial products integrate this into their models, because I think it’s a very useful metric.”

If implemented in practice, the researchers noted, AI would make the recall decision for nearly 20% of women without radiologist input. While surveys indicate most women are open to AI in screening, many still prefer at least one radiologist review. A hybrid model where radiologists handle uncertain and recall cases may offer a more acceptable balance.

“The use of AI with uncertainty quantification can be a possible solution for workforce shortages and could help build trust in the implementation of AI,” Verboom said, adding that a prospective trial will be needed to confirm the efficiency gains in real-world practice.

The project is part of aiREAD, supported by the Dutch Research Council, Dutch Cancer Society, and Health Holland.