Three examples of patches that are false positives from the CNN (A, B, C), and three examples of patches that were true positives by the CNN (D, E, F), but were false negatives by most of the readers participating in the study.
Ever since the introduction of computer-aided detection (CAD) software for mammography, the question has existed as to whether the technology will ever replace the radiologist. An article published in the August issue of Medical Image Analysis describes deep learning software that has outperformed a state-of-the-art mammography CAD system and proved to be comparable to radiologists participating in a performance study.
Lead author Thijs Kooi, a doctoral candidate at the Diagnostic Image Analysis Group (DIAG) of the Department of Radiology and Nuclear Medicine, and colleagues, have been working on the development of breast CAD systems that can functional independently. They wish to improve upon breast CAD system detection of suspicious masses and microcalcifications. In 2013, a DIAG team showed that the sensitivity of a CAD system developed with input from radiologists set at a high specificity aimed at decision support for detection of malignant masses and architectural distortion had performance comparable to nine certified screening radiologists and three residents reading 200 digital screening mammograms.1 This CAD system incorporated context, symmetry, and the relation between two views of the same breast.
The current study tested software based on a deep learning system. Deep learning software utilizes increasing amounts of data. It can reduce human bias because features that were previously hardcoded based on radiologists’ knowledge are now learned directly from data. The software used a Convolution Neural Network (CNN) trained on a large dataset of about 45,000 breast images. It was compared to a state-of-the-art mammography CAD system that included all descriptors commonly applied in mammography. The CNN included a set of 74 features which were categorized as pixel level, contrast, texture, geometry, and patient features. These are described in detail in the article as well as the actual experiments comparing outcomes of the CNN with the commercial mammography CAD system.
The researchers also measured the performance of three experienced mammography readers on a patch level, providing the readers with the same information as the CNN system. The patches were shown on a display monitor at a resolution of 200 microns. The readers were provided with a slider and instructed to score the patch between 0-100 based on their assessment of the suspiciousness of the patch. A total of 398 patches were used that included a combination of malignant masses and an equal amount of negatives that had been categorized as difficult to interpret. The authors reported that there was no significant difference between the CNN and any of the readers.
An analysis of the misclassified positives and negatives by the CNN was conducted. The authors said that benign abnormalities such as cysts, fibroadenomas, and normal structures such as lymph nodes or fat necrosis represented the largest percentage of incorrectly identified masses. The authors said that they had not included benign lesions in the training set and need to do so in the future. They pointed out that when breast CAD is used as just a second “reader”, this misclassification would be caught and overridden by an interpreting radiologist. However, it needs to be resolved for software designed to be used independently.
The reader study illustrates the network is not far from the radiologists’ performance, but still substantially below the mean of the readers,” the authors concluded. Kooi told Applied Radiology that “the presented system works very well when working with regions of interest (ROI). However, this is not how radiologists read mammograms. Information such as discrepancies between left and right breast, and a comparison to previous exams should also be incorporated in the evaluation of a case. We are currently extending the deep learning algorithms to also look at asymmetries and temporal change that may be indicative of a malignancy.”
He added, “The presence of other findings within an image, such as microcalcifications and benign abnormalities, affect the radiologist decision.” Kooi said he is currently working on machine learning algorithms that can reason intelligently about different findings in an image and incorporate all information that clinicians use to make a decision about a case.
Prototype deep learning CAD to detect breast lesions reaching mammographer’s performance. Appl Radiol.