ABSTRACT
PURPOSE
To evaluate the predictive value of a combination model of Liver Imaging Reporting and Data System (LI-RADS)-based magnetic resonance imaging (MRI) and clinicopathologic features to identify atypical hepatocellular carcinoma (HCC) in LI-RADS category M (LR-M) observations.
METHODS
A total of 105 patients with HCC based on surgery or biopsy who underwent preoperative MRI were retrospectively reviewed in the training group from hospital-1 between December 2016 and November 2020. The LI-RADS-based MRI features and clinicopathologic data were compared between LR-M HCC and non-HCC groups. Univariate and least absolute shrinkage and selection operator regression analyses were used to select the features. Binary logistic regression analysis was then conducted to estimate potential predictors of atypical HCC. A predictive nomogram was established based on the combination of MRI and clinicopathologic features and further validated using an independent external set of data from hospital-2.
RESULTS
Of 113 observations from 105 patients (mean age, 61 years; 77 men) in the training set, 47 (41.59%) were classified as LR-M HCC. Following multivariate analysis, aspartate aminotransferase >40 U/L [odds ratio (OR): 4.65], alpha-fetoprotein >20 ng/mL (OR: 13.04), surface retraction (OR: 0.16), enhancing capsule (OR: 5.24), blood products in mass (OR: 8.2), and iso/hypoenhancement on delayed phase (OR: 10.26) were found to be independently correlated with LR-M HCC. The corresponding area under the curve for a combined model-based nomogram was 0.95 in the training patients (n = 113) and 0.90 in the validation cohort (n = 53).
CONCLUSION
The combined model incorporating clinicopathologic and MRI features demonstrated a satisfactory prediction result for LR-M HCC.
Main points
• This retrospective study of 113 hepatocellular carcinomas (HCCs) at dynamic-enhanced magnetic resonance imaging (MRI) evaluated the predictive value to identify Liver Imaging Reporting and Data System (LI-RADS) M (LR-M) HCC in a combination model incorporating LI-RADS-based MRI and clinicopathologic features.
• In the combined model, aspartate aminotransferase >40 U/L [odds ratio (OR): 4.65)], alpha-fetoprotein >20 ng/mL (OR: 13.04), surface retraction (OR: 0.16), enhancing capsule (OR: 5.24), blood products in mass (OR: 8.2) and iso/hypoenhancement on delayed phase (OR: 10.26) were independent predictors of LR-M HCC.
• The nomogram-based model had satisfactory performance to discriminate LR-M HCC from LR-M non-HCC (area under the curve: 0.95 for the training set and 0.90 for the validation set).
The Liver Imaging Reporting and Data System (LI-RADS) is a comprehensive, dynamic system that is constantly updated with user feedback, evolving knowledge, and technological advancements for patients with or at risk of hepatocellular carcinoma (HCC).1,2 In the most recent version, published in 2018 (v2018), the LI-RADS M category (LR-M) represents observations that are probably or definitely malignant but not specific to HCC. However, based on current LI-RADS data, approximately one-third of all LR-M lesions are categorized as HCC, approximately two-thirds are categorized as non-HCC malignancies, and approximately 5% are categorized as benign.1,3 HCC with atypical features in the LR-M category should be diagnosed early to determine treatment options, as the biological behavior and prognoses differ between HCC and non-HCC malignancies.1,4 However, more importantly, distinguishing HCC from non-HCC malignancies remains extremely challenging,1,5 especially under the assumption that the presence of any LR-M features indicates LR-M. Due to the partial overlap between LR-M HCC and LR-M non-HCC malignancies with respect to the pathological components, clinical presentations, and imaging features, a biopsy is required for diagnosis.5,6,7 Additionally, imaging is usually required for guidance.5
For the diagnosis of HCC, multimodality imaging in cross-sections, especially dynamic contrast-enhanced magnetic resonance imaging (MRI), is one of the most effective tools due to the diagnostic information obtained from different MRI sequences.1,5,7,8 The LR-M diagnosis criteria are composed of non-targetoid and targetoid masses. The latter represents intrahepatic cholangiocarcinoma (ICC), combined hepatocellular-cholangiocarcinoma, or HCC with atypical features. In addition, there are many other features, such as major and ancillary features, that favor HCC specifically or that are not included in LI-RADS.1 Therefore, if support can be found for LR-M HCC in numerous features not restricted to LR-M criteria, it may not be necessary for some patients with a high risk of LR-M malignancies to undergo a biopsy. In this way, it may be possible to optimize the discrimination of HCC from non-HCC in LR-M lesions and to avoid significant complications by invasive tissue sampling. Following the identification of discriminative features, relative models were developed based on a variety of feature sets. Previous studies have focused on the discrimination of LR-M categories with different imaging features.9,10,11,12,13 However, few studies have proposed a non-invasive and comprehensive contrast-enhanced MRI model for the status of LR-M HCC with serology tests that are reasonably priced and readily available.
Based on these gaps in the literature, this study aimed to evaluate the predictive value of a combined model of MRI and clinicopathological features for identifying atypical HCC in LR-M observations.
Methods
Training patients
The protocol for this retrospective study was approved by the Shanghai General Hospital Institutional Review Board [(2023) 171, 5/16/2023] of the two hospitals in the study, and the requirement for informed consent was waived. A total of 375 consecutive patients were first identified from the first center [Hospital-1, Shanghai General Hospital-North (city center)] between December 2016 and November 2020. The inclusion criteria based on the LI-RADS v2018 diagnostic algorithm were as follows: (a) adult patients (≥18 years old), (b) patients with cirrhosis and/or chronic hepatitis B viral infection, (c) patients who had undergone a preoperative contrast-enhanced MRI within 3 weeks before surgery or biopsy, and (d) patients with LR-M features based on MRIs.1 A total of 160 patients without eligible clinical and imaging data were excluded for the following reasons: (a) they had prior hepatic malignancies (n = 25), (b) important clinical data relating to them, such as levels of alpha-fetoprotein (AFP), carbohydrate antigen-199, carcinoembryonic antigen, and aspartate aminotransferase (AST) were not available (n = 78), (c) they had received oncological treatment before undergoing MRI (n = 50), or (d) their MRI were of insufficient quality (n = 7), including 5 patients without the optimal timing arterial phase. Additionally, after imaging analysis, 110 patients were excluded for the following reasons: (e) they had coexisting LR-4 (probable HCC) and/or LR-5 (definite HCC) lesions (n = 78) for the reason that there was no way to determine either LR-M lesions or the coexisting LR-4 and/or LR-5 lesions contributing to serum tumor marker levels, (f) they had tumors in the vein (n = 30), or (g) they had cirrhosis due to a vascular disorder or diffuse nodular regenerative hyperplasia based on LI-RADS v2018 (n = 2). Ultimately, 105 patients were included in the study, and each patient was categorized into the LR-M HCC group (n = 43) or the non-HCC group (n = 62) (Figure 1).
Magnetic resonance image acquisition
All MRI abdominal images were obtained on a 3.0-Tesla clinical scanner [Philips Ingenia (Philips Healthcare) or General Electrical (GE) Discovery 750W (GE Healthcare)] using a body phased-array coil. The conventional abdominal MRI protocol consisted of the following sequences: T1-weighted (in-phase and out-of-phase), T2-weighted, and diffusion-weighted imaging (DWI) (b = 0, 500, 1,000 s/mm2). Corresponding maps of the apparent diffusion coefficient (ADC) were automatically calculated by the MRI system. For dynamic contrast-enhanced imaging, a three-dimensional gradient echo sequence with T1 high-resolution isotropic volume examination or liver acquisition with volume acceleration was performed before and after intravenous injection of gadopentetate dimeglumine. The contrast media (Magnevist; Bayer Healthcare, Germany, 0.1 mmol/kg) was injected at a rate of 1–2 mL/sec followed by a flush with a maximum dose of 20 mL saline. Hepatic arterial (early and late), portal, and equilibrium phase images were obtained at 15–25, 60–80, and 180 sec after contrast medium injection, respectively. The hepatobiliary agents were not used for abdominal MRI. Detailed MRI scanner parameters are shown in Supplementary Table 1.
Imaging analysis
All MRIs were assessed using the same picture archiving and communication system (Pathspeed, GE Medical Systems Integrated Imaging Solutions, Prospect, IL). An analysis of the images was performed independently by two abdominal radiologists, X-X.H. and L.Z., who had 7 and 23 years of experience in hepatic imaging, respectively. They were both blinded to any outcome information of patients, and disagreements were resolved by discussion based on bookmarked images, which were used as a guide.
The MRI morphological features were evaluated according to the LI-RADS v2018, including major, ancillary, and LR-M signs. The threshold growth was not included because there was only one examination per patient in the analysis. Moreover, the MRI signal intensity was evaluated at T1-weighted, T2-weighted, DWI, and postcontrast phase for the whole observation. Furthermore, the enhancement pattern of each observation was evaluated at the postcontrast phase. To avoid the influence of variable internal nodules, compartments, or septations on signal intensity in mosaic architecture, the hyper/iso/hypo signal intensity was defined as >50% of the whole observation showing visually assessed hyper/iso/hypo signal in the dynamic enhancement MRI and DWI within an observation.
Model building
First, for LR-M HCC, screening the risk factors consisting of clinicopathology and MRI was performed using univariate analysis. Second, the least absolute shrinkage and selection operator (LASSO) regression was used for further screening of the selected variables to discourage the use of overfit data in the model. Additionally, as a result of constraints, those variables with a prevalence (<5% or >95%) were also discarded, considering their limited application in identifying different LR-M observations to ease model overfitting. Finally, a binary logistic regression analysis was conducted with backward stepwise selection. Variables with P values <0.05 were recognized as potential risk factors for LR-M HCC, and corresponding models were simultaneously established (Figure 2).
Validation patients
Another retrospective validation study consisting of 50 patients from the second center [Hospital-2, Shanghai General Hospital-South (Songjiang new city)] between December 2020 and March 2022 was available to verify the proposed predictive model. Patients were included and excluded using the same criteria as those in the training set, which were then used to validate (Figures 1, 2).
Statistical analysis
The descriptive statistics of data were given as mean ± standard deviation for normalized variables and median (min–max) for non-normalized variables after a normality analysis of continuous variables using the Shapiro–Wilk test. For the categorical variables, descriptive statistics were reported as numbers and percentages (n, %). Continuous variables were compared using Student’s t-test or the Mann–Whitney U test. Categorical variables were analyzed with the χ2 test or Fisher’s exact test where applicable. Univariate analysis and LASSO regression analysis were performed to identify the risk factors to discriminate LR-M HCC and LR-M non-HCC. Binary logistic regression analysis was then conducted to build clinicopathologic, MRI, and combined models. Receiver operating characteristics (ROC) analysis was finally performed with corresponding areas under the curve (AUCs) computed. Inter-observer agreement analysis for MRI features was performed using Cohen’s kappa statistics (slight, 0.00–0.20; fair, 0.21–0.40; moderate, 0.41–0.60; substantial, 0.61–0.80; perfect, 0.81–1.00). Values of P < 0.05 were considered statistically significant. All data analyses were performed using MedCalc software (MedCalc 20.022; MedCalc, Mariakerke, Belgium) and R software (version 3.4.1).
Results
Clinicopathologic characteristics
A total of 105 patients (mean age, 61 ± 14 years; 77 men) with 113 liver observations were classified as the training set, which comprised 47 (41.59%) LR-M HCC malignancies, 56 (49.56%) LR-M non-HCC malignancies, and 10 (8.85%) benign lesions. Seven patients were diagnosed by biopsy, and each patient had one observation. A total of 50 patients (mean age, 59 ± 13 years; 36 men) with 53 liver observations were classified as the validation set. The training set comprised an HCC group (mean age, 57 ± 14 years; 31 men, 12 women) and a non-HCC group (mean age, 63 ± 13 years; 46 men, 16 women) (P = 0.03 ). There was no significant difference between the sex distributions of the two groups (P = 0.682), but there was a statistically significant difference in the ages of the two groups (P = 0.034). In the training set, hepatitis B virus infection was observed in most patients, whether in the HCC group [41 (95.35%)] or in the non-HCC group [57 (91.94%9], whereas other etiologies occurred rarely. The AST levels >40 U/L and serum AFP levels >20 ng/mL were both significantly higher (P = 0.002, P < 0.001, respectively) in the HCC group [23 (48.94%); 27 (57.45%)] than in the non-HCC group [11 (16.67%); 9 (13.64%)]. However, serum carcinoembryonic antigen levels ≤5 μg/mL were more likely to be lower in the HCC group [41 (87.23%)] than in the non-HCC group [45 (68.18%)] (P = 0.019). There were no significant differences in the remaining demographic variables between the two groups. Additionally, no variables were significantly different between the training and validation sets. An overview of the data is presented in Table 1.
Univariate analysis of magnetic resonance imaging features
The MRI features of the LR-M HCC and non-HCC groups are summarized in Table 2. Fifteen MRI features remained after univariate analysis. For the LR-M targetoid appearance, 14 (29.79%) cases had peripheral washout in the HCC group compared with 6 (9.09%) in the non-HCC group (P = 0.004), whereas only 3 (6.38%) cases had delayed central enhancement in the HCC group compared with 16 (24.24%) cases in the non-HCC group (P = 0.012). For LR-M nontargetoid appearance, marked diffusion restriction [11 (23.40%) cases], surface retraction [6 (12.77%) cases] and peritumoral bile duct dilatation [5 (10.64%) cases] were less frequent in the HCC group than in the non-HCC group [28 (42.42%), 32 (48.48%), and 29 (43.94%) cases, respectively] (P = 0.036, P < 0.001, P < 0.001, respectively). In regard to major features, capsular enhancement was more frequent in the HCC group [25 (53.19%) cases] than in the non-HCC group [15 (22.73%) cases] (P = 0.001). Regarding the ancillary features favoring HCC, all variables were significantly different between the two groups. For the signal intensity and enhancement pattern, washout or isoenhancement on the portal venous or delayed phase (DP) was present among 21 cases in the HCC group (44.68%) and only 4 cases in the non-HCC group (6.06%) (P < 0.001). Hyperenhancement was not significantly more common in the portal venous phase or DP in the HCC group than in the non-HCC group. The hyperintensity on DWIs constituted the majority of observations in both groups, with P = 0.016.
Feature selection
The results of the selection algorithm are detailed in Figure 2. A total of 19 variables related to clinicopathology and MRI met the criteria for univariate analysis. By performing a LASSO regression analysis, 14 variables with non-zero coefficients were then entered into the training set (λ: 0.017655622). Finally, two variables (non-enhancing capsule and signal on DWIs) were removed from the model due to the prevalence being too high or low.
Multivariate analysis
Detailed results are presented in Table 3. The diagnostic model of LR-M HCC based on only clinicopathological characteristics showed that both AST [odds ratio (OR): 6.72; 95% confidence interval (CI): 2.44, 18.49; P < 0.001] and AFP (OR: 11.19; 95% CI: 4.05, 30.90; P < 0.001) were significant risk factors for HCC. The second model based on only MRI features showed that surface retraction (OR: 0.11; 95% CI: 0.03, 0.40; P < 0.001), capsular enhancement (OR: 6.69; 95% CI: 2.13, 21; P = 0.001), blood products in mass (OR: 6.25; 95% CI: 1.7, 23; P = 0.006), and iso/hypoenhancement on DP (OR: 12.76; 95% CI: 3.67, 44.36; P < 0.001) were significant risk factors for HCC. The combined model consisting of clinicopathological and MRI factors showed that all of the abovementioned variables with different ORs and 95% CIs were associated with HCC (Figure 3). As a final step, a forest plot and nomogram were developed after identifying those factors.
Diagnostic performance of different models from the training and validation sets
An assessment of diagnostic test results using ROC curve analysis was further performed to identify LR-M HCC for different models (Figure 4). The AUCs with 95% CIs were 0.81 (0.72, 0.88), 0.89 (0.81, 0.94), and 0.95 (0.89, 0.98) for the clinicopathological model, MRI model, and combined model in the training set, respectively. The AUCs with 95% CIs were 0.74 (0.61, 0.85), 0.88 (0.76, 0.95), and 0.90 (0.76, 0.97) for the clinicopathological model, MRI model, and combined model in the validation set, respectively. The corresponding sensitivities, specificities, positive predictive values, negative predictive values, positive likelihood ratios, negative likelihood ratios, and cut-off values are detailed in Table 4.
Prediction of the nomogram and construction of external validation
A ROC curve was also drawn to assess the diagnostic accuracy of LR-M HCC in the validation set (Figure 4). The AUC value of the combined model [OR (95% CI), 0.90 (0.76, 0.97)] was greater than that of both the clinicopathological [0.74 (0.61, 0.85)] and MRI models [0.88 (0.76, 0.95)] in the validation set, similar to the results mentioned earlier in the training set. Overall, the combined model had the strongest predictive value in both the training and validation sets, with a concordance index (C-index) of 0.948 and 0.899, respectively. As shown by the calibration plots (Figure 4), both the training and validation sets showed good consistency between the predictions and the actual observations. The clinical use of decision curve analysis for the LR-M HCC nomogram is presented in Figure 5. Ultimately, two examples of a nomogram application in practice are presented in Figure 6.
Discussion
Recently, various prognostic models for LR-M lesions have been described,11,12,13,14,15 but an ideal model combining clinicopathologic and MRI features for discriminating LR-M HCC from other observations has not been developed. In a previous study,12 targetoid tumors and enhancing capsules were combined to identify LR-M HCC, which showed high specificity (93.8%) but low sensitivity (76.6%). In this study, the authors established a nomogram-based combined model including AST, AFP, and MRI (surface retraction, enhancing capsule, blood products in mass, and iso/hypoenhancement on DP) features to classify LR-M HCC. The model had a high sensitivity (training, 93.6%; validation, 95%) for identifying LR-M HCC with specificity (training, 87.9%; validation, 75.8%). The nomogram for identifying LR-M HCC yielded satisfactory results in the training (C-index 0.948) and validation (C-index 0.899) datasets.
High AFP levels [OR: 13.04; 95% CI: (3.16, 53.9)] had the strongest association with LR-M HCC and had the highest weight in the nomogram-based model. AFP levels played an important role in distinguishing LR-M HCC from other observations in previous studies,16,17,18 and AFP expression was also higher in cytokeratin 19-positive patients with HCC who were more coincident with imaging features for LR-M HCCs.13,19 Thus, AFP levels may be used to identify LR-M HCCs, but with the consideration that AFP levels were also high in patients with combined HCC-cholangiocarcinoma (cHCC-CCA). In our current study, cHCC-CCA was comprised of only 10.61% of LR-M non-HCC in the training set and only 6.06% in the validation set. A relatively small amount of cHCC-CCA may have had an impact on the significance of AFP. Therefore, discrimination between LR-M HCCs and LR-M non-HCCs based on AFP levels remains to be further confirmed in a larger study. The AST levels [OR: 4.65; 95% CI, (1.09, 19.92)] were of minimal importance for our model, even though it was regarded as a predictor for LR-M HCC. It is possible that the microenvironment of the chronic inflammatory response of the liver and subsequent liver damage contributed to HCC,7,20,21 which resulted in clinically higher AST levels among patients with impaired hepatic function.
As the strongest contributor to the MRI model, iso/hypoenhancement on DP [OR: 10.26; 95% CI, (2.38, 44.22)] ranked second only to AFP levels for identifying LR-M HCC in the combined model. Previous studies showed that hyperintensity on DP was more common in ICC than in atypical HCC.6,15 These findings were similar to the authors’ findings that hyperintense lesions accounted for most LR-M non-HCC lesions (89.39%), of which more than half were ICC. The reason may be linked to the relatively abundant pathological fibrosis of ICC compared with LR-M HCC, which can mimic conventional HCC.22,23 On the contrary, sparing fibrosis in LR-M HCC makes a relatively weak contribution to the prolonged retention of extracellular gadolinium contrast agent, which results in isointense or hypointense on DP in LR-M HCC.
In addition to iso/hypoenhancement on DP, both enhancing capsule [OR: 5.24; 95% CI: (1.47, 18.64)] and surface retraction [OR: 0.16; 95% CI, (0.04, 0.62)] were correlated with LR-M HCC. Enhancing capsule suggested more fibrous tissue peripherally, which represented expansile growth in atypical HCC.14,22 In contrast, more than half (51.52%) of the non-HCC cases in the study were ICC cases, which contained a higher proportion of tumor cells peripherally, manifesting an uncommon capsule appearance.24 Although a small fraction of HCCs may mimic pathological findings of ICCs based on similar biliary differentiation,25,26 enhancing capsule still reliably predicted LR-M HCC. Conversely, surface retraction occurred less frequently [6/47 (12.77%)] in the LR-M patients with HCC. It is possible that surface retraction was frequently observed in mass-forming ICC with a relatively fibrotic component instead of HCC, as described in previous studies.15,27
Blood products in mass [OR: 8.2; 95% CI, (1.71, 39.22)] was associated with LR-M HCC. This feature accounted for 40.43% of LR-M HCC lesions, similar to a 50% proportion reported by Jiang et al.13 Another study indicated that blood products in mass may be useful for differentiating LR-M HCC from non-HCC malignancies.28 Usually, hemorrhage represents rapidly growing tumors with an increasing level of malignancy, and the tumor vasculature is correspondingly disrupted. Compared with conventional HCC, LR-M patients with HCC experienced a worse prognosis and were also characterized by abundant blood supply.19,25,29 This may explain why LR-M HCC cases had a significantly higher incidence of blood products than non-HCC cases with a relatively insufficient blood supply.
The study’s predictive model of LR-M HCC was developed using univariate, LASSO, and multivariate analysis, which effectively enabled the feature selection. For the training cohort, the prediction model that contained six selected factors yielded an AUC of 0.95. The calibration curve results showed satisfactory agreement between the predicted LR-M HCC rates and observed probability. The validation of the nomogram-based model is crucial in avoiding overfitting and determining the generalization.30 Thus, external experimental data were validated in our combined model. The AUC reached 0.90 for the validation set when distinguishing LR-M HCC and demonstrated a good calibration power in which the bias-corrected curve was close to the ideal curve. Additionally, the combined model with the decision curve provided more benefits for making clinical decisions within a range of 0.01–0.94 and 0.02–0.90 threshold probability in the training and validation sets, respectively. By using the nomogram-based model, clinicians can accurately predict the HCC risk of individuals with LR-M observations.
Several limitations were identified in this study. First, it was done retrospectively. Second, a relatively small sample was used in the multivariate analysis; however, another study demonstrated that relaxing the rule of ten events for one variable in logistic regression was acceptable in certain contexts.31 Third, the authors could not evaluate MRI features in the transitional and hepatobiliary phases without performing gadoxetic acid-enhanced MR imaging due to medical insurance considerations. Fourth, a large prevalence of hepatitis B virus infection might limit the utility in Western countries. Fifth, there was a limited number of combined-type HCC-CCA lesions, which made it particularly challenging to differentiate LR-M observations. Sixth, patients diagnosed by biopsy may not exclude the possibility of cHCC-CCA, even though only seven patients were involved. Finally, it was not possible to perform quantitative measurements for ADC value and contrast-enhanced MRI parameters due to the use of different MRI scanners.
In conclusion, the overall analysis of this combined nomogram-based model incorporating clinicopathologic and MRI items demonstrated a satisfactory prediction result for LR-M HCC, and data are easily available via routine blood tests and MRI examination. The model may have substantial clinical utility not only in terms of individualized risk estimation but also in terms of its clinical application for minimizing or eliminating the need for biopsy.