Hybrid Radiomics and Deep Learning for Multi-Parametric MRI-Based Classification of Benign and Malignant Breast Tumors: A Multi-Institutional Study
Main Article Content
Abstract
Introduction: Breast cancer is a leading cause of cancer mortality in women worldwide. Multiparametric MRI combining DCE-MRI, DWI, and T2-weighted imaging provides valuable functional and morphological data for lesion characterization. However, diagnostic accuracy remains limited by overlapping imaging features and inter-observer variability, especially in equivocal (e.g., BI-RADS 4) cases.
Objectives: To develop and validate a hybrid machine learning model that fuses handcrafted radiomic features with deep learning representations from multi-parametric breast MRI to improve the classification of benign and malignant tumors.
Methods: We retrospectively analyzed 428 histopathologically confirmed breast lesions (218 malignant, 210 benign) from two tertiary institutions (2016–2022). All cases included DCE-MRI, diffusion-weighted imaging (b = 0, 800 s/mm²), and T2-weighted sequences. Lesions were manually segmented by expert radiologists. A total of 1,218 radiomic features were extracted and reduced to 87 non-redundant features. Concurrently, a 3D ResNet-18 model processed a 4-channel input (DCE peak/washout phases, ADC map, T2) to generate deep features. Three classifiers were evaluated: (1) radiomics + SVM, (2) deep learning (3D ResNet-18), and (3) a hybrid model fusing both feature types. Performance was assessed via five-fold stratified cross-validation using accuracy, sensitivity, specificity, F1-score, AUC, and Brier score, with statistical comparisons via DeLong’s test.
Results: The hybrid model achieved the highest performance: AUC = 0.947 (95% CI: 0.922–0.972), accuracy = 91.1%, sensitivity = 89.4%, and specificity = 92.8% significantly outperforming both radiomics-only (AUC = 0.892, p = 0.008) and deep learning-only (AUC = 0.924, p = 0.021) approaches. Subgroup analysis revealed lower sensitivity for invasive lobular carcinoma (50.0%), consistent with known MRI limitations. The model generalized well across 1.5T and 3.0T scanners and demonstrated strong performance in the diagnostically ambiguous BI-RADS 4 category (accuracy = 87.7%). Comparative benchmarking showed superior AUC relative to prior state-of-the-art methods on larger, multi-institutional data.
Conclusions: The proposed hybrid radiomics–deep learning framework leverages complementary strengths of interpretable quantitative features and high-level spatial representations to achieve state-of-the-art classification performance in multi-parametric breast MRI. This approach holds significant promise as a clinical decision-support tool to reduce unnecessary biopsies and improve diagnostic confidence, particularly in equivocal cases. Future work will focus on prospective validation and integration of automated segmentation and molecular biomarkers.