ABSTRACT
PURPOSE
The aim of this meta-analysis is to summarize the diagnostic accuracies of point shear wave elas- tography (pSWE) and two-dimensional (2D) SWE for esophageal varices (EV) and varices needing treatment (VNT).
METHODS
We conducted a systematic review and meta-analysis of diagnostic accuracy studies. We searched for studies reporting the EV and VNT diagnostic accuracy of pSWE and 2D SWE using PubMed Cen- tral, SCOPUS, MEDLINE, Embase, and Cochrane databases. STATA software“Midas”package was used for meta-analysis.
RESULTS
A total of 24 studies with 3867 patients were included in the review. Pooled score sensitivities of pSWE were 91% (95% CI, 80%-96%) for EV, and 94% (95% CI, 86%-97%) for VNT. Pooled score sensi- tivities of 2D SWE were 78% (95% CI, 69%-85%) for EV, and 79% (95% CI, 72%-85%) for VNT. Pooled score specificities of pSWE were 70% (95% CI, 60%-78%) for EV, and 59% (95% CI, 40%-75%) for VNT. Pooled score specificities of 2D SWE for EV were 79% (95% CI, 72%-85%) 72% (95% CI, 66%-77%) for VNT. We found significant heterogeneity for all the elastography-based measurements with the chi- square test results and an I2 statistic >75%.
CONCLUSION
Both pSWE and 2D SWE can diagnose EV and VNT with moderate diagnostic accuracy. Further large- scale setting-specific longitudinal studies are required to establish the best modality.
Main points
• Point shear wave elastography (pSWE) and two-dimensional shear wave elastography (2D SWE) have been designed to diagnose the esophageal varices (EV) and varices needing treatment (VNT) in high-risk patients.
• Previous studies have assessed the diagnostic accuracy of these methods in general, but did not compare specific diagnostic accuracies of these types of elastography for EV and VNT.
• The aim of this meta-analysis is to summarize the available data on the diagnostic accuracies of pSWE and 2D SWE for EV and VNT.
• Both pSWE and 2D SWE can diagnose EV and VNT with moderate diagnostic accuracy.
The degree of portal hypertension has been associated with the development of complications, such as hepatic encephalopathy, variceal bleeding, hepatorenal syn-drome, and ascites.1 These complications are major causes of death in patients with liver cirrhosis (LC).1 Variceal bleeding has a higher rate of rebleeding and mortality, and is associated with the presence of esophageal varices (EVs).2, 3 Hence, variceal bleeding is a major concern in liver cirrhosis patients. The prevalence of high risk EVs in LC patients is approximately 15%–25%.4 Therefore, screening of such high-risk patients will be helpful in early identification of EV and to reduce the risk of rebleeding and mortality.
Transient elastography (TE) was introduced as the first shear-wave imaging technique for assessing the liver stiffness.5 TE has been used as a noninvasive technique for the evaluation of liver fibrosis. It assesses liver elasticity from the low-frequency elastic wave velocity propagated through the liver.5 Several studies have reported that liver stiffness measurement using TE positively correlates with the clinically significant portal hypertension and EV6 due to the fact that portal hypertension occurs as a direct consequence of fibrotic transformation of liver tissues.7 However, the main limitation of TE is that it cannot be used to assess patients with ascites or patients with high body mass index.8 Alanine aminotransferase levels can also influence the interpretation of TE and produce inaccurate results.
Real-time elastography devices may overcome some TE limitations by allowing direct visualization of liver and spleen. They were shown to have higher applicability and similar accuracy in detecting clinically significant portal hypertension.9 Both real-time elastography or two-dimensional shear-wave elastography (2D-SWE) and point shear-wave elastography (p-SWE) are types of SWE that use acoustic radiation force impulse (ARFI) technology. In the past few years, several studies have used these real-time elastography methods for the detection of EV and varices needing treatment (VNT). Liver and spleen stiffness measurements, assessed by these techniques, have been used to predict EV. To the best of our knowledge, no study has summarized evidence on the diagnostic accuracy of 2D-SWE or p-SWEbased liver and spleen measurements for EV and VNT. The aim of the current study is to conduct a detailed literature search and summarize outcomes from studies reporting the diagnostic accuracy of 2D-SWE and p-SWE based measurements for EV and VNT.
Inclusion and exclusion criteria
No restrictions on study design and participants were applied. Inclusion criteria were: studies conducted among patients suffering from chronic liver conditions such as portal hypertension, liver cirrhosis with HVPG <10 mmHg, Child A or advanced fibrosis; studies evaluating the accuracy of any of the two techniques of real-time shear wave elastography, pSWE or 2D SWE, using any of the two measurements (liver or spleen stiffness) for the diagnosis of EV or varices needing treatment (VNT); studies using upper gastrointestinal endoscopy as reference standards for EV or VNT diagnosis.10 Exclusion criteria were: studies not reporting sensitivity or specificity data (or values needed to calculate these parameters); unpublished studies or grey literature.
Search strategy
We searched PubMed Central, EMBASE, MEDLINE, SCOPUS, CINAHL, and ScienceDirect databases for the research papers published on SWE for EV or VNT diagnosis, from inception till January 2021, with no language restrictions. Medical subject headings (MeSH) and free-text terms such as “Esophageal Varices”, “Shear wave Elastography”, “Two-dimensional Elastography”, “Point Source Elastography”, “Validation Studies”, “Diagnostic Accuracy Studies”, “Varices Needing Treatment”, “Spleen Stiffness” and “Liver Stiffness” were used. Manual review of the bibliographies of the retrieved articles was also done to ensure further comprehensive search.
Selection of studies
Primary screening of the studies was independently performed by two reviewers, including screening of the title, abstract, and keywords and downloading the relevant full-text articles. Secondary screening of fulltext articles was performed by the same two reviewers to select relevant articles satisfying the inclusion criteria of our review. Cases of disagreements were resolved by discussion with a third independent reviewer.
Data extraction
Data were extracted by the principal investigator from the included full-text publications using pre-specified form, and entered directly into the STATA software (StataCorp). The following data were extracted: first author and year of publication, country, setting, study participants, study design, sample size, type of SWE, type of measurements (liver stiffness/spleen stiffness), cutoff, average age, sensitivity, and specificity values. Quality of the entered data was further assessed by the second author before executing the analysis.
Risk of bias assessment
Risk of bias assessment was performed by two authors independently using the quality assessment of diagnostic accuracy studies-2 (QUADAS-2) tool that assesses patient selection bias, conduct and interpretation of index and reference test, and flow and timing of outcome assessment.10 Grading discrepancy of the studies and final decision on whether the studies are having high, low or unclear risk of bias was decided by the third author.
Statistical analysis
Data entry and analysis was performed using the STATA version 14 software (StataCorp). Pooled sensitivity and specificity were estimated for both types of SWE, and liver and spleen stiffness measurements separately using bivariate meta-analysis method. Other diagnostic accuracy parameters included positive and negative likelihood ratios (LRP and LRN) and the diagnostic odds ratio (DOR) for SWE methods and measurements. Graphical representation of these diagnostic accuracy parameters was done by forest plot (pooled sensitivity and specificity), LR scattergram (LRP and LRN) and Fagan plot (pre- and post-test probability). Final summary estimates were depicted using summary receiver operator characteristic curve (sROC).
Heterogeneity between the studies was evaluated by the chi-square test and I2 statistic, and graphically represented by the bivariate box-plot. Meta-regression analysis was conducted to identify the source of heterogeneity in the results. The covariates adjusted were design, country, participants, measurement type, sample size, mean age, and quality of the individual studies. Deeks’ test and funnel plot were used to assess the publication bias. Sensitivity analysis was performed to check the robustness of the pooled estimates.
Study selection
Systematic search of five databases resulted in a total of 1478 studies. Of them, 149 studies were selected for the full-text article retrieval. An additional six articles were retrieved after hand-searching the bibliography sections of the selected studies. Finally, 24 studies with 3867 participants met the inclusion criteria and were included in our analysis (Figure 1).11-34
Characteristics of the studies included
The majority (19 out of 24) of the included studies were prospective. More than half of the studies (13 out of 24) were conducted in Asian countries, such as Korea, China, Japan, and India. The mean age of the participants ranged from 5.2 to 68.8 years. We analyzed data from 3867 patients to evaluate the diagnostic accuracy of pSWE and 2D SWE, and each of the measurements, with sample sizes ranging from 34 to 468 patients. Of the included 24 studies, 12 reported diagnostic accuracy of pSWE and 12 studies reported diagnostic accuracy of 2D SWE. Nine studies have reported EV as outcome, seven studies reported VNT as outcome and eight studies reported both EV and VNT as outcome. Most of the studies have used upper gastrointestinal endosco-py as the reference standard (Table 1).
Risk of bias assessment
Figure 2 and Table 2 show the risk of bias across various domains per QUADAS tool results. Seven out of 24 studies had high patient selection bias risk. Eight studies had high conduct and interpretation of index test bias risk. Six studies had high patient flow a nd i nterval b etween i ndex tests and reference standards bias risk, and four studies had high reference standard bias risk.
Diagnostic accuracy of pSWE for EV
The accuracy of pSWE-based measurements for diagnosing EV was reported in 10 studies,11,16,22-26,28-30 with a total of 14 pSWEbased measurements. The pooled sensitivity and specificity of pSWE-based measurements for diagnosing EV were 91% (95% CI, 80%-96%) and 70% (95% CI, 60%-78%), respectively with area under ROC curve of 0.85 (95%CI: 0.77-0.90) (Figures 3a, 4a). The DOR was 22 (95% CI, 11-42). The LRP was 3 (95% CI, 2.3-3.8) and the LRN was 0.14 (0.070.27). As shown in LR scattergram (Figure 5a), LRP and LRN in the right lower quadrant indicate that the pSWE-based measurements cannot be used for EV confirmation or exclusion. Fagan’s nomogram (Figure 6a) shows good clinical value of pSWE-based measurements for diagnosing EV (positive, 72%; negative, 11%) that differ significantly from the pre-test probability (47%). There was a significant between-study heterogeneity with a chi-square P < .001 and an I2> 75%, which was further confirmed by the bivariate box plot (Figure 7a).
Deek’s test for publication bias was nonsignificant (P = .36), indicating the absence of publication bias, as confirmed by a symmetrically shaped funnel plot (Figure 8a). We next performed a meta-regression analysis to explore the source of heterogeneity using potential covariates (Figure 9a). Our results indicate that none of the factors were significant in the sensitivity model; the reference test standards were significant in the specificity model (P < .001) and joint model (P = .01).
Subgroup analysis was done based on the type of measurements. Out of the 10 studies, four have measured both liver and spleen stiffness to diagnose EV, three studies measured liver stiffness, and three studies measured spleen stiffness to diagnose EV. Pooled sensitivity and specificity of pSWE-based liver stiffness measurements for diagnosing EV were 88% (95% CI, 69%96%) and 68% (95% CI, 49%-82%), respectively, while for the pSWE-based spleen stiffness measurements, the sensitivity and specificity were 92% (95%CI: 78%-97%) and 69% (95%CI: 61%-76%), respectively.
Diagnostic accuracy of pSWE for VNT
The accuracy of pSWE-based measurements for diagnosing VNT was reported in six studies,12, 15, 16, 24, 25, 29 with a total of seven pSWE-based measurements reported. Pooled sensitivity and specificity of pSWEbased measurements for diagnosing VNT were 94% (95% CI, 86%-97%) and 59% (95% CI, 40%-75%), respectively, with area under ROC curve of 0.91 (95%CI: 0.87-0.93) (Figures 3b, 4b). DOR was 21 (95% CI, 7-58), LRP was 2.3 (95% CI, 1.5-3.5) and LRN 0.11 (0.05-0.24). The LR scattergram (Figure 5b) showed the LRP and LRN in the right lower quadrant, indicating that the pSWE based measurements cannot be used for VNT confirmation or exclusion. Fagan’s nomogram (Figure 6b) confirmed g ood c linical v alue o f p SWE-based measurements for diagnosing VNT (positive, 39%; negative, 3%), differing s ignificantly from the pre-test probability (22%).
We also found significant b etween-study variability (heterogeneity) with a chi-square P < .001 and an I2> 75%. The bivariate box plot further confirmed the heterogeneity (Figure 7b). However, we could not perform meta-regression to explore the source of heterogeneity as there were less than 10 studies reporting the outcome. For similar reasons, Deek’s test or funnel plot could not be performed to assess the publication bias. Subgroup analysis was done based on the type of measurements (only for liver stiffness, as less than four studies reported spleen stiffness). The pooled sensitivity and specificity of pSWE-based liver stiffness measurement for diagnosing VNT were 86% (95% CI, 75%-93%) and 59% (95% CI, 39%-77%) respectively.
Diagnostic accuracy of 2D SWE for EV
The accuracy of 2D SWE-based measurements for diagnosing EV was reported in seven studies.17, 20, 21, 27, 32-34 In total, nine pSWEbased measurements were reported in these studies. Pooled sensitivity and specificity of pSWE-based measurements for diagnosing EV were 78% (95% CI, 69%-85%) and 79% (95% CI, 72%-85%), respectively, with area under ROC curve of 0.85 (95%CI: 0.81-0.89) (Figures 3c, 4c). The DOR was 13 (95% CI, 8-22). The LRP was 3.7 (95% CI, 2.7-5.0) and the LRN 0.28 (0.20-0.39). The LR scattergram (Figure 5c) showed the LRP and LRN in the right lower quadrant, indicating that the 2D SWE-based measurements cannot be used for EV confirmation or exclusion. Fagan’s nomogram (Figure 6c) showed good clinical utility of 2D SWE-based measurements for diagnosing EV (positive, 78%; negative, 21%), differing significantly from the pre-test probability (49%).
We also found significant between-study variability (with a chi-square P < .001 and an I2> 75% that was confirmed by the bivariate box plot (Figure 7c). Since only 10 studies reported this outcome, meta-regression was not performed to explore the source of heterogeneity. Similarly, Deek’s test or funnel plot could not be performed to assess the publication bias. Subgroup analysis was done based on the measurement type (only for liver stiffness as less than four studies reported using spleen stiffness). The pooled sensitivity and specificity of 2D SWE-based liver stiffness measurement for diagnosing EV were 74% (95% CI, 67%-80%) and 80% (95% CI, 71%-86%) respectively.
Diagnostic accuracy of 2D SWE for VNT
The accuracy of 2D SWE based measure- ments for diagnosing VNT was reported in nine studies12, 15, 16, 24, 25, 29 that reported 13 2D SWE-based measurements. Pooled sensitivity and specificity of pSWE-based measurements for diagnosing VNT were 79% (95% ments for diagnosing VNT was reported in CI, 72%-85%) and 72% (95% CI, 66%-77%), respectively, with area under ROC curve of 0.81 (95%CI: 0.75-0.86) (Figures 3d, 4d). The DOR was 9 (95% CI, 5-16). The LRP was 2.8 (95% CI, 2.2-3.5) and the LRN 0.29 (0.21-0.42).
The LR scattergram (Figure 5d) shows the LRP and LRN are in the right lower quadrant, indicating that the 2D SWE based measurements cannot be used for VNT confirmation or exclusion. Fagan’s nomogram (Figure 6d) shows good correlation between 2D SWE based measurements and VNT diagnosis (positive, 53%; negative, 11%), differing significantly from the pre-test probability (29%).
There was a significant between-study variability (heterogeneity) with a chi-square P < .001 and an I2> 75%. The bivariate box plot further confirmed the heterogeneity (Figure 7d), with Deek’s test for publication bias yielding a nonsignificant P value (P = .21), indicating the absence of publication bias. Similarly, a symmetrically shaped funnel plot showed no indication of bias (Figure 8b). We performed a meta-regression analysis to explore the source of heterogeneity using potential covariates (Figure 9b). Our results indicate that patient selection domain was significant in the sensitivity model (P < .05); patient selection, index text and reference test standards were significant in the specificity model (P < .05). In the joint model, measurement type was found to be a significant source of heterogeneity (P = .04).
Subgroup analysis was done based on the type of measurements. Nine studies have measured liver stiffness to diagnose VNT, while four studies have measured spleen stiffness. The pooled sensitivity and specificity of the 2D SWE-based liver stiffness measurement for diagnosing VNT were 75% (95% CI, 66%-82%) and 72% (95% CI, 65%-78%), while for the 2D SWE-based spleen stiffness measurement, the sensitivity and specificity were 87% (95% CI: 79%92%) and 68% (95% CI: 62%-74%), respectively.
Sensitivity analysis
We have performed additional sensitivity analysis for all the outcomes to check the robustness of the study results based on the quality of studies and differing definitions of EVs and VNTs. We did not find any significant variation between studies with high or low quality. The pooled estimates were also similar, irrespective of the definition of EVs.
Discussion
The main goal of this review was to evaluate the diagnostic performance of the pSWE- and 2D SWE-based measurements for the diagnosis of EV and VNT. A systematic literature search identified 24 studies (mostly prospective, and with low bias risks) that reported the accuracy of pSWE and 2D SWE for diagnosing EV and VNT. For the diagnosis of both EV and VNT, pSWE had better sensitivity, while 2D SWE had better specificity. These findings are in agreement with previous reviews of diagnostic accuracy of ultrasound elastography.35, 36 Other diagnostic accuracy parameters were moderately similar: in the LR scattergram, LRN and LRP occupied the right lower quadrant, indicating that both elastography-based measurement methods cannot be used for EV and VNT confirmation or exclusion.
The clinical feasibility of these scoring systems was relatively acceptable, as indicated by Fagan’s nomogram, showing a significant rise in the post-elastography probability compared to the pre-elastography probability. Similar to the previous review on elastography-based measurements in diagnostics of EV and VNT, spleen stiffness had better diagnostic accuracy compared to liver stiffness measurements for both types of elastography.36 However, further studies comparing the diagnostic performance of these elastography-based measurements are needed to accurately identify the parameters and the modality to implement these methods in clinical practices.
There are several other techniques available in addition to 2D-SWE and pSWE, such as magnetic resonance elastography (MRE), transient elastography (TE) and noninvasive biomarkers. There are several advantages of shear wave elastography compared to these techniques, such as reduced cost, lesser time requirement and less stress for patients undergoing the procedure. However, our study found that the accuracy of this tool to rule in or rule out the EV or VNT is moderate. Further large-scale longitudinal studies are needed to assess the diagnostic accuracy of these elastography types when spleen stiffness measurements are used, since only few studies reported using this parameter. These studies may help to further optimize healthcare resources in clinical practice.
It is important to interpret the results of our study with caution, considering quality and differences in methods among the included studies. For example, we found a significant between-study variability (significant chi-square test and I2 statistic values). This heterogeneity can be attributed to the differing ethnicities of the study participants, and to the variable risk factors and clinical picture severity amongst the patients in the studies. Deek’s test and funnel plots confirmed the lack of publication bias amongst the studies reporting the EV and VNT diagnostic accuracy of both elastography types.
The main strengths of our review are as follows: i) to our knowledge, this is the first meta-analysis assessing the diagnostic accuracy of two different type of elastography-based measurements for EV and VNT among chronic liver disease patients; ii) the large number of studies with high sample sizes (24 studies with 3867 patients); and iii) the lack of significant publication bias, which further adds to the credibility of the results in this meta-analysis. However, our study had several limitations. We have found a significant between-study variability in our analysis that can limit the possibility to infer or interpret the pooled findings. We tried to overcome this limitation by performing meta-regression to identify the source of heterogeneity. However, it could not be performed for all the outcomes due to the limitation in the number of studies. For similar reasons, we could not assess the publication bias for the majority of the outcomes. The diagnostic accuracy depends on other factors such as the ethnicity of the participants or patients, the timing of the assessment, and the severity of liver condition. The influence of these variables was not assessed in our study.
Conclusion
Both pSWE and 2D SWE may be used to identify patients at risk of developing EV and VNT with moderate-to-high diagnostic accuracy. Applying this noninvasive modality could reduce the need for more invasive diagnostic procedures, and potentially reduce the associated healthcare costs. Further large-scale setting-specific longitudinal studies are required to establish the best modality for assessing all the patients admitted with chronic liver conditions to tertiary care hospitals.