ABSTRACT
PURPOSE
To differentiate benign and malignant breast masses by extracting radiomic features from low-energy and recombined contrast-enhanced mammography (CEM) images and to evaluate the diagnostic performance of multiple machine learning classifiers.
METHODS
In this retrospective, single-center study, 145 patients who underwent CEM between February 2019 and January 2022 were included. Radiomic features were extracted from manually segmented regions of interest on low-energy and recombined images using an open-source workflow (ITK-SNAP and PyRadiomics). The dataset was split at the patient level into a training set (75%) and an independent test set (25%); within the training set, feature selection and model optimization were performed using 10-fold cross-validation. Diagnostic performance [as measured by area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value, and negative predictive value] was reported on the held-out independent test set.
RESULTS
Ensemble learning demonstrated the best performance for both image types. The highest accuracy and AUC were 91.8% and 0.978 for recombined images and 89.7% and 0.968 for low-energy images, respectively. For recombined images, ensemble learning yielded the highest sensitivity (91.8%), whereas neural networks achieved the highest specificity (95.8%). For low-energy images, ensemble learning reached the highest sensitivity (98.0%), and decision trees achieved the highest specificity (91.7%).
CONCLUSION
Radiomics analysis of CEM images can effectively differentiate between benign and malignant breast masses, potentially enhancing diagnostic accuracy in breast imaging.
CLINICAL SIGNIFICANCE
A radiomics workflow based on recombined CEM images and open-source tools may complement conventional CEM interpretation, improve non-invasive lesion characterization, and support further research toward clinically validated decision-support applications.
Main points
• Radiomics-based analysis of recombined contrast-enhanced mammography (CEM) images achieved high diagnostic performance in differentiating benign from malignant breast masses.
• An open-source workflow using ITK-SNAP and PyRadiomics provides a transparent and reproducible pipeline that can be implemented in radiology departments equipped with digital mammography systems.
• Shape- and texture-based radiomic features derived from CEM may serve as quantitative biomarkers to support lesion characterization and risk stratification research in breast imaging.
• Recombined CEM images outperformed low-energy images across multiple machine learning classifiers, highlighting the added value of contrast-enhanced information for lesion characterization.
Breast cancer is the most commonly diagnosed malignancy in women and a leading cause of cancer-related mortality.1 Early detection, predicting treatment response, and estimating prognosis are crucial for improving survival rates.2 Mammography remains the primary screening modality, reducing breast cancer mortality by approximately 30%.3 However, its sensitivity is limited in dense breasts.
Contrast-enhanced mammography (CEM) is a digital mammographic technique that provides functional and morphological information using iodinated contrast agents. The technique has been shown to have higher sensitivity than standard mammography and comparable performance to breast magnetic resonance imaging (MRI) while reducing false positives, and it is not affected by breast density.4 CEM has been increasingly used for lesion characterization, staging, and treatment monitoring.
In CEM, benign and malignant lesions are differentiated based on tumor shape, contour, contrast enhancement patterns, and kinetic characteristics. However, medical images contain quantitative data that are invisible to the human eye but can provide valuable diagnostic insights. Radiomics involves extracting and analyzing these high-dimensional features to characterize tissue properties. The radiomics workflow includes feature extraction through statistical, filtering, and morphological techniques, followed by feature selection to retain the most diagnostically relevant parameters. Machine learning algorithms then classify lesions as benign or malignant based on these features. Although radiomics has been widely studied in non-contrast mammography and MRI, its application in CEM is relatively new. Preliminary studies have demonstrated that radiomics analysis of CEM images can achieve classification accuracies ranging from 80% to 90% in tumor classification and holds promise for distinguishing subtypes, assessing invasiveness, and predicting tumor grade.5-9
This study extracts radiomic features from CEM images for benign–malignant mass differentiation and evaluates their diagnostic performance using machine learning algorithms.
Methods
Study population
A total of 145 patients with suspicious breast masses on CEM were retrospectively included, yielding 164 breast masses (73 benign and 91 malignant) for the final radiomics analysis. Patients who had contraindications to iodinated contrast agents or incomplete imaging data were excluded. Malignant lesions were one per patient (91 lesions in 91 patients); the higher number of lesions than patients was due to the benign group, in which 58 patients contributed 73 benign lesions (i.e., multiple lesions occurred only in the benign subgroup). Each lesion was segmented and analyzed as a separate lesion-level sample. The patient and lesion selection process is summarized in Figure 1. This retrospective study was approved by the Institutional Review Board of Karadeniz Technical University (approval number: 2022/121, date: June 2, 2022), and the requirement for informed consent was waived.
Imaging protocol
CEM was performed using a digital mammography unit (Senographe Essential, GE Healthcare, Buc, France). An intravenous contrast agent (1.5 mL/kg, 50–120 mL) was administered at 3 mL/s. Craniocaudal (CC) and mediolateral oblique (MLO) views of both breasts were acquired, starting approximately 2 minutes after contrast injection, and all views were completed within 6–7 minutes, generating low-energy and recombined images.
Radiomics analysis
Image assessment and segmentation
CEM images were evaluated using a dedicated mammography workstation by two radiologists with 20 and 3 years of experience in breast imaging, respectively. Lesion size and histopathological type were recorded for each patient.
All images were stored in DICOM format and processed using ITK-SNAP 3.8 (University of Pennsylvania, Philadelphia, PA, USA; www.itksnap.org), an open-source image segmentation tool. The radiologist with 3 years of experience manually segmented the lesions, ensuring that the region of interest (ROI) strictly encompassed the lesion itself (Figure 2). This segmentation was applied to all low-energy and recombined CC and MLO images.
Feature extraction and selection
Radiomic feature extraction was conducted using PyRadiomics (AIM-Harvard, Boston, MA, USA), an open-source Python package for radiomic feature extraction from two-dimensional and three-dimensional images. No pre-processing was applied before extraction. A total of 102 radiomic features were computed, categorized into the following matrices:
• Shape-based features
• First-order statistical features
• Gray-Level Co-occurrence Matrix (GLCM)
• Gray-Level Run Length Matrix (GLRLM)
• Gray-Level Size Zone Matrix (GLSZM)
• Gray-Level Dependence Matrix (GLDM)
• Neighboring Gray-Tone Difference Matrix (NGTDM) (Supplementary Table 1)
To reduce dimensionality and improve model efficiency, minimum redundancy–maximum relevance, ReliefF, and ANOVA algorithms were implemented in MATLAB R2022b (MathWorks, Inc., Natick, MA, USA). Each algorithm generated a ranking based on feature importance scores. The top 10 most significant features were selected by each algorithm (Figures 3 and 4). A total of 22 features for recombined images and 25 features for low-energy images were selected for analysis (Supplementary Table 2).
Classification and model optimization
Supervised machine learning classifiers were developed to classify ROIs as benign or malignant. All analyses were performed using MATLAB R2022b. Prior to model development, the dataset was split at the patient level into a training set (75%) and an independent test set (25%) to prevent information leakage. When multiple benign lesions were present in the same patient, all lesions from that patient were kept within the same split. The independent test set was not used at any stage of feature selection, model tuning, or model selection.
The following classifiers were evaluated: ensemble learning, decision trees, naïve Bayes, support vector machines, and neural networks (Supplementary Table 3). Within the training set, 10-fold cross-validation was used for model optimization and selection. The final selected model was then trained on the full training set and evaluated on the held-out independent test set. Performance was assessed using area under the receiver operating characteristic curve (ROC) AUC, accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).
All steps of the radiomics workflow, including image acquisition, lesion segmentation, feature extraction, feature selection, and model training, are summarized in Figure 5.
Statistical analysis
Descriptive statistics for continuous variables were expressed as mean ± standard deviation, and categorical variables as numbers and percentages. The machine learning workflow, including patient-level data partitioning, cross-validation, and leakage prevention, is described in the “Classification and Model Optimization” section. Diagnostic performance metrics (AUC, accuracy, sensitivity, specificity, PPV, and NPV) were reported for the held-out independent test set.
Results
Patient characteristics
A total of 164 ROIs were analyzed. Of these, 44.5% (73) were benign, and 55.5% (91) were malignant. The mean age was 48.1 ± 9.5 years for benign cases and 49.9 ± 10.5 years for malignant cases. Invasive ductal carcinoma was the most common malignant diagnosis (57%), and stable follow-up masses were the most common benign findings (41%). The mean lesion size was 23.2 ± 20.6 mm in the benign group and 35.7 ± 21.5 mm in the malignant group (Tables 1 and 2).
Performance of radiomics analysis
All reported performance metrics represent results from the held-out independent test set (25% of patients). Model selection and optimization were performed within the training set using 10-fold cross-validation.
The classification performance of five different machine learning algorithms was evaluated. The confusion matrices and ROC curves for recombined images are illustrated in Supplementary Figure 1. Among all models, ensemble learning exhibited the best diagnostic performance, achieving an AUC value of 0.9783. The corresponding AUC values for the other classifiers were 0.9728 for neural networks, 0.9583 for support vector machines, 0.8793 for decision trees, and 0.8520 for naïve Bayes.
The classification results for low-energy images are depicted in Supplementary Figure 2. As with recombined images, ensemble learning exhibited the highest classification performance, with an AUC value of 0.9677. The AUC values for the other classifiers were 0.9418 for neural networks, 0.9332 for support vector machines, 0.8695 for decision trees, and 0.8818 for naïve Bayes.
The accuracy and AUC values for each classifier in both recombined and low-energy images are summarized in Table 3. The ensemble learning classifier achieved the highest accuracy, with values of 91.75% for recombined images and 89.69% for low-energy images. The performance of the other classifiers is outlined below:
• For recombined images, the accuracy rates were as follows: neural networks (89.85%), support vector machines (89.69%), decision trees (84.74%), and naïve Bayes (84.54%).
• For low-energy images, the accuracy rates were as follows: neural networks (87.63%), support vector machines (85.57%), decision trees (86.60%), and naïve Bayes (82.47%).
Table 4 summarizes the sensitivity, specificity, PPV, and NPV of each classifier. In recombined images, ensemble learning and support vector machines exhibited the highest sensitivity (91.83% and 91.82%, respectively). The highest specificity and PPV were obtained with the neural network classifier (95.83% and 95.34%, respectively). Ensemble learning achieved the highest NPV at 91.66%.
For low-energy images, ensemble learning exhibited the highest sensitivity (97.95%) and NPV (95.50%). The highest specificity (91.66%) and PPV (90.90%) were achieved using the decision tree classifier.
Discussion
CEM enhances the diagnostic accuracy of digital mammography by visualizing tumor neovascularity through iodinated contrast administration. The CEM technique enables the concurrent evaluation of morphological and enhancement characteristics, achieving a sensitivity range of 96%–100%, comparable to that of MRI, and improving the specificity of digital mammography from 42% to 87.7%.10, 11
Radiomics is an emerging field that uses extracted quantitative imaging biomarkers to enhance diagnostic accuracy and improve patient management. By analyzing texture, shape, and statistical features, radiomics provides an objective and reproducible assessment of tumor characteristics. Although most radiomics studies have focused on MRI, CEM offers a unique advantage by combining mammographic and contrast-enhanced information, positioning it as a promising modality for radiomics analysis.2, 5, 12
In the present study, 102 radiomic features, including first-order, shape-based, and texture-based metrics (GLCM, GLRLM, GLSZM, NGTDM, and GLDM), were evaluated. Compared with previous radiomics studies using CEM, our models achieved numerically higher AUCs and accuracies in differentiating benign from malignant lesions (Supplementary Table 4).5, 9, 13-17 The more comprehensive feature set used in this study may have contributed to these improved results.
Our findings indicate that recombined CEM images outperform low-energy images in distinguishing benign from malignant masses. Ensemble learning achieved the highest accuracy for both image types, whereas neural networks demonstrated the best specificity for recombined images, and support vector machines performed best for low-energy images.
Clinical relevance and practical applicability
The current study aims to bridge the gap between advanced image-based computational analysis and clinical decision-making in breast imaging. By using widely accessible open-source tools such as ITK-SNAP and PyRadiomics and integrating standard segmentation and feature selection algorithms, the proposed methodology can be feasibly replicated in radiology departments equipped with digital mammography infrastructure. The selected radiomic features, particularly shape- and texture-based descriptors, may aid in distinguishing benign from malignant lesions and complement routine CEM assessment. However, this retrospective study did not evaluate integration with established clinical assessment systems (e.g., BI-RADS), perform comparisons with radiologist interpretation, or conduct decision-threshold and clinical-utility analyses, meaning the potential impact on clinical decision-making (including biopsy decisions) should be interpreted cautiously. With additional multi-reader robustness testing, external validation in larger multicenter cohorts, and prospective clinical-utility assessment, this approach could serve as a foundation for future decision-support tools.
This retrospective, single-center study has several limitations. First, the sample size was modest, with an imbalanced distribution of benign and malignant lesions. Second, lesions were manually segmented by a single reader; thus, interobserver agreement and segmentation robustness (e.g., intraclass correlation coefficient-based feature stability) could not be assessed, which may affect reproducibility. Third, no external validation cohort was available, and the extremely high AUC values may partly reflect optimistic performance and potential overfitting in a limited dataset with high-dimensional radiomic features. Future research should incorporate larger, multicenter datasets, harmonization strategies, multi-reader or automated segmentation approaches, and external validation.
In conclusion, radiomics analysis of CEM—particularly recombined images—showed strong discrimination between benign and malignant breast masses within an open-source and reproducible workflow. These findings highlight the potential of integrating radiomics-derived quantitative biomarkers into artificial intelligence-based decision-support systems for breast lesion characterization. Future studies should evaluate model generalizability and robustness in larger cohorts by incorporating multi-reader segmentation robustness assessment and independent external multicenter validation and by benchmarking performance against standard clinical assessment frameworks.


