ABSTRACT
CONCLUSION
Machine learning-based radiomics models based on pre-treatment MRI can detect carcinogenic HPV status with discriminative accuracy.
RESULTS
Forty-one patients were enrolled in the study (26 were positive for carcinogenic HPV oncogenes, and 15 were negative). A total of 851 features were extracted from each imaging sequence. After feature selection, 5, 17, and 20 features remained in the CE-T1, T2WI, and combined groups, respectively. The SVM models showed 83%, 95%, and 95% accuracy scores, and the LR models revealed 83%, 81%, and 92.5% accuracy scores in the CE-T1, T2WI, and combined groups, respectively. The SVM algorithm performed better than the LR algorithm in the T2WI feature subset (P = 0.005), and the feature sets in the T2WI and the combined group performed better than CE-T1 in the SVM model (P = 0.033 and 0.006, respectively). The combined group feature subset performed better than T2WI in the LR model (P = 0.023).
METHODS
Pre-treatment MRI images of patients with cervical cancer were collected retrospectively. An HPV DNA oncogene analysis was performed based on cervical biopsy specimens. Radiomics features were extracted from contrast-enhanced T1-weighted images (CE-T1) and T2-weighted images (T2WI). A third feature subset was created as a combined group by concatenating the CE-T1 and T2WI subsets. Feature selection was performed using Pearson’s correlation coefficient and wrapper- based sequential-feature selection. Two models were built with each feature subset, using support vector machine (SVM) and logistic regression (LR) classifiers. The models were validated using a five-fold cross-validation technique and compared using Wilcoxon’s signed rank and Friedman’s tests.
PURPOSE
This study aimed to evaluate the potential of machine learning-based models for predicting carcinogenic human papillomavirus (HPV) oncogene types using radiomics features from magnetic resonance imaging (MRI).
Main points
• Prediction of carcinogenic human papillomavirus (HPV) oncogenes enables the identification of high-risk patients and can be used as a prognostic marker.
• Machine learning-based radiomics models can predict carcinogenic HPV DNA status in cervical cancer.
• Similar accuracy rates from different algorithms show the feasibility of machine learning-based models.
Cervical cancer is the fourth most common female cancer and the second most common in women aged 15–44.1 The etiological factor in more than 95% of cervical cancer cases is human papillomavirus (HPV).2,3,4 Fifteen of more than 200 oncogene types are identified as high risk, and type-16 and -18 HPV infections are the most common in women with cervical cancer.5 In addition, several studies in the literature report that HPV DNA status is associated with treatment response, disease-free survival, and overall survival in patients with cervical carcinoma.6,7,8,9,10
Radiomics is a method for extracting quantitative features from medical imaging. In medical imaging, hundreds of radiomic features can be extracted from pixels invisible to the human eye.11 Various studies have been published using radiomics features to predict tumor histopathology, stage, grade, and clinical outcomes in cervical cancer.11,12,13,14,15 Additionally, different high-accuracy performance machine learning-based models have been created to predict HPV status in oropharyngeal cancer using radiomics features.16,17 However, no studies have been reported that investigate the prediction of HPV status in cervical cancer using radiomics features obtained from magnetic resonance imaging (MRI).
This study aimed to evaluate the potential value of machine learning-based models for predicting carcinogenic HPV oncogene types in cervical cancer by extracting radiomics features from MRI.
Methods
Ethics
This was a retrospective study conducted with the approval of our institutional ethics review board (approval number: 514.10/35). Informed consent was waived due to retrospective nature of the study.
Patient eligibility
Patients admitted to our radiation oncology department between 2015 and 2018 with squamous cell carcinoma of the uterine cervix were enrolled in this study. Their clinical data (age, smoking history in years, and tumor stage) were reviewed. The tumor stage was determined by assessing lymph node involvement by positron emission tomography–computed tomography images and the presence of distant metastasis. Pre-treatment pelvic MRI images were evaluated. Patients without pre-treatment MRI images in our Picture Archiving and Communication Systems and those in whom the images had prominent artifacts were excluded. Figure 1 summarizes the radiomics work pipeline.
The authors acknowledge that some of the patients’ data were used in another study investigating the correlation between radiotherapy response and HPV infection status.18
Gold standard
The gold standard for the study was HPV-DNA oncogene analysis performed with reverse-transcriptase polymerase chain reaction (rt-PCR) from cervical biopsy materials. A dedicated research laboratory performed the HPV-DNA oncogene analysis.
MRI technique
The two primary pelvic MRI sequences selected for the radiomics input were the sagittal T2 weighted images (T2WI) and the contrast-enhanced three-dimensional fast spoiled gradient echo sequence (CE-T1) (T1W high-resolution isotropic volume examination/liver acquisition with volume acquisition). MRI examinations were performed using two 1.5-T unit MRI scanners [Achieva 1.5-T (Philips Healthcare, Netherlands) and Signa Dx (GE Medical Systems, USA)] using phased-array body coils. The imaging protocol for sagittal T2WI was repetition time/echo time (TR/TE): 5,300/100 ms, field of view (FOV): 24 cm, matrix: 320 × 256, and slice thickness/slice gap: 5/2 mm. The parameters selected for CE-T1 were TR/TE: 4.1/1.1 ms, FOV: 32 cm, matrix: 288 × 192, and slice thickness/slice gap: 3/0.3 mm. The time delay was set at 40 sec to achieve the late arterial phase.
Image preprocessing and feature extraction
The images were preprocessed with an N4ITK magnetic bias-field correction algorithm to avoid the intensity differences and substantial noise caused by different scanners.19 After preprocessing, pixels were rescaled to 1 × 1 mm2 with a cubic B-spline interpolation, and gray levels were discretized to a fixed gray-level bin width of 3.20
Segmentations were performed from sagittal T2WI and sagittal reconstructed CE-T1 images semi-automatically by two radiologists, one with experience of more than 20 years in abdominal radiology and a fourth-year resident, with consensus. For better orientation to the tumor, axial images of CE-T1 were used when needed. The largest cross-sectional area of cervical tumors was segmented with the freely available 3D Slicer software (v.4.10.2) (Figure 2). A 2-mm shrinkage was applied to every segmented label to extract the exact tumor texture. Six subgroups of radiomics features were extracted from the original and wavelet-filtered images by the PyRadiomics extension package included in the 3D Slicer software.21
Feature selection and data handling
For the stability of the machine learning models, data preprocessing steps that majorly impact classification solvers22 were followed as standardization and discretization to 10 bins, with a uniform bin width.
Feature selection is a requisite to avoid overfitting the model with high-dimensional data, as it reduces dimension. A two-step process was followed for feature selection. First, Pearson’s correlation coefficient was used to select and drop redundant features. Feature pairs with higher collinearity than the 0.7 thresholds were detected, and those with high collinearity to the other features were dropped.23 Second, non-redundant features were used to input a wrapper-based sequential-feature selection algorithm with a support vector machine (SVM) algorithm as a learning estimator. The wrapper-based sequential-feature selection was performed with backward propagation with five-fold cross validation. In this wrapper method, multiple learning models with various feature subsets were trained with training folds and tested with the remaining test fold using a five-fold cross-validation technique. The data were divided into five equal parts. In a five-folded turn, one data part was selected as test data, with the remaining four as training data. A different part of the data was selected as test data in each fold. Thus, the phenomenon of “double-dipping” was avoided.24 As with the backward propagated wrapper method, the models were initiated with all the features included. The selection process was performed by eliminating the least important ones until the stopping conditions were satisfied.
Model building
The selected features from T2WI, CE-T1, and the combined datasets were included as inputs to the machine learning models. To evaluate the feasibility of the machine learning algorithms, two different models with different contexts were built by coding in Python (v.3). The algorithm of the first model was SVM, with hyperparameters of C:1.0 and kernel: “linear.” The second algorithm was selected as logistic regression (LR), with hyperparameters of C:1, solver: “liblinear,” and regularization penalty: “L2.” A five-fold cross-validation method evaluated the performance of the models.
Statistical analysis
Descriptive statistics of the data are presented as numbers and percentages (n, %), non-normalized variables are shown as medians (interquartile range), and normalized variables (for parametric tests) as mean ± standard deviation. An independent-samples t-test and a Mann–Whitney U test were performed on the numeric variables after a normality analysis using the Kolmogorov–Smirnov test. Fisher’s exact test and the Fisher–Freeman–Halton exact test were performed when appropriate. Receiver operating characteristics (ROC) curves were plotted in the Python coding environment using the “sci-kit learn” library. The area under the ROC curve (AUC) was calculated with P values.
Comparisons between the models with LR and SVM algorithms were performed using Wilcoxon’s signed rank test. Comparisons of the models with different feature subsets were performed using Friedman’s test. The Dunn–Bonferroni post-hoc test was conducted if statistical significance was found. Statistical analysis was performed using SPSS software v.23,25 and the statistical significance level selected was P < 0.05.
Results
Patients
There were 98 patients enrolled in the study. Fifty patients were excluded due to a lack of imaging, and seven were excluded because of prominent artifacts in their images. Twenty-six (63%) patients were positive for HPV-DNA oncogenes (types 16, 31, 45, or 52) according to the rt-PCR test. Fifteen (37%) patients were negative for HPV-DNA oncogenes. Table 1 summarizes the characteristics of the patients.
Feature extraction and selection
A total of 851 features from each of the CE-T1 and T2WI images were extracted. Features were grouped as follows: 14 (1.64%) shape, 18 (2.11%) first order, 14 (1.64%) gray-level dependence matrix, 24 (2.82%) gray-level co-occurrence matrix, 16 (1.88%) gray-level run-length matrix, 16 (1.88%) gray-level size-zone matrix, 5 (0.06%) neighboring gray-tone difference matrix, and 744 (87.97%) wavelet-derived texture features. A combined dataset was created by concatenating features from T2WI and CE-T1.
Pearson’s correlation coefficient determined 32, 49, and 75 features as non-redundant in CE-T1, T2WI, and the combined group, respectively. After the wrapper-based sequential feature selection step, the final feature subsets consisted of five features in CE-T1, 17 in T2WI, and 20 in the combined group. Table 2 and Figure 3 provide details of the selected features.
Classification performance
The SVM models had 83.10%, 95.20%, and 95.30% accuracy scores in the CE-T1, T2WI, and combined groups, respectively. The AUC values and 95% confidence intervals (CI) were 0.85 95% CI: 0.99, 0.71; 0.96, 95% CI: 1, 0.93; and 0.98, 95% CI: 1, 0.95, P = 0.001 for the CE-T1, T2WI, and combined groups, respectively.
Models with the LR algorithm had accuracy scores of 83.13%, 81.20%, and 92.50% in the CE-T1, T2WI, and combined groups, respectively. The AUC values were 0.83, 95% CI: 0.96, 0.70; 0.94, 95% CI: 0.99, 0.89; and 0.93 95% CI: 1, 0.85, P = 0.001 for the CE-T1, T2WI, and combined groups, respectively. Table 3 shows the detailed performance metrics, and Figure 4 presents the ROC curves of all the models in each test fold.
The SVM model with features from T2WI outscored the LR model in Wilcoxon’s signed rank test (P = 0.005, Table 4). There was no significant difference between the performances of the SVM and LR models in the CE-T1 and combined groups (P = 1.000 each, Table 4).
In Friedman’s test, a significant difference was observed between the SVM models (P = 0.004). The SVM models showed better performance in the T2WI and combined groups than in the CE-T1 group individually (P = 0.033 and 0.006, respectively, Table 5). There was no statistically significant difference between the SVM models in T2WI and the combined group (P = 1.000, Table 5). When the performances of the LR models were compared, there was a significant difference in the results of Friedman’s test (P = 0.018). The combined group performed better than the T2WI group (P = 0.023, Table 5).
Discussion
In this study, we investigated the potential value of machine learning-based models with MRI radiomics analysis for predicting the carcinogenic HPV status of cervical cancers. Our study showed that a satisfactory predictive potential could be achieved with machine learning-based models. We built machine learning-based models with two different algorithms that work on different principles to reduce the possibility of overfitting and to test the feasibility of various models. Achieving similar accuracy rates from both algorithms shows the feasibility of machine learning-based models for predicting oncogenic HPV types.
In the literature, no study has been conducted that investigates the predictability of carcinogenic HPV status from pre-treatment MRI. Therefore, we were not able to compare our results with those of other studies.
Practical implications
The results of our study could be helpful in clinical practice. HPV plays a significant role in the development of cervical cancer. Additionally, many studies have investigated the impact of pre-treatment HPV status on prognosis. A recently published meta-analysis indicated that positive HPV DNA status favors good prognosis in cervical cancer.26 The tests that detect HPV DNA are divided into nucleic acid hybridization assays, signal amplification assays, and nucleic acid amplification assays. HPV DNA is detected by rt-PCR and Hybrid Capture II tests. However, the HPV DNA test is not routinely performed in patients with cervical cancer, especially in middle- and low-income countries.27 Considering that cervical cancer is mostly fatal in countries with a low socioeconomic status,28 the prediction of carcinogenic HPV DNA from MRI can be an alternative to molecular HPV DNA tests.
Although the prognostic role of HPV in cervical cancer has been reported in a comprehensive meta-analysis,27 several studies have shown that HPV status does not have any prognostic significance.29,30,31,32 In addition to the studies showing that HPV negativity before treatment is a poor prognostic factor for disease-free survival and overall survival,8,9,31,33,34 one study has shown that the carcinogenic HPV subtype has prognostic significance.31 Based on these findings, it is clinically beneficial to detect HPV DNA status before treatment.
The present study investigated the prediction of HPV status using pre-treatment MRI images. However, the changes in HPV status after treatment have also been shown to impact prognosis. Persistent HPV positivity in patients after radiotherapy is a poor prognostic factor.6,7,10
Limitations and generalizability
Our study has several limitations. First, this was a retrospective study; since all the data were obtained from previous recordings, this could have led to a selection bias. Second, the images were obtained from two scanners, which, although it may be challenging for machine learning models, simulates clinical practice. With the image preprocessing steps, we aimed to standardize the variation from different scanners and protocols to be able to generalize machine learning-based models. Third, the segmentations were performed semi-automatically by two radiologists in consensus to increase the segmentation accuracy. Therefore, a reproducibility analysis could not be performed. Fourth, the authors segmented the most significant slice of the tumor in two-dimensional planes. According to tumor heterogeneity, volumetric segmentation may be a more precise method; however, it is impractical and needs excessive time. Moreover, most studies on texture analysis in cervical cancers are designed based on this technique. Fifth, features from quantitative MRI maps, such as the apparent diffusion coefficient, could not be extracted due to a lack of diffusion-weighted imaging sequences in the imaging protocols.35 Finally, we did not split our data into training and test sets. However, we used a five-fold cross-validation technique. Since our patient population was small, we could not afford losing any information that could have been beneficial for training.
In conclusion, machine learning-based radiomics models based on pre-treatment MRI can detect carcinogenic HPV status with discriminative accuracy. The fact that HPV status, an essential prognostic factor in survival, can be predicted by MRI raises the issue of whether we can predict survival using MRI.