The reproducibility of interventional radiology randomized controlled trials and external validation of a classification system

Assala Aslan; Christopher Stevens; Amro Saad Aldine; Ahmed Mamilly; Luis De Alba; Octavio Arevalo; Chaitanya Ahuja; Hugo H. Cuellar

doi:10.4274/dir.2023.222052

ABSTRACT

PURPOSE

The fragility index (FI) measures the robustness of randomized controlled trials (RCTs). It complements the P value by taking into account the number of outcome events. In this study, the authors measured the FI for major interventional radiology RCTs.

METHODS

Interventional radiology RCTs published between January 2010 and December 2022 relating to trans-jugular intrahepatic portosystemic shunt, trans-arterial chemoembolization, needle biopsy, angiography, angioplasty, thrombolysis, and nephrostomy tube insertion were analyzed to measure the FI and robustness of the studies.

RESULTS

A total of 34 RCTs were included. The median FI of those studies was 4.5 (range 1–68). Seven trials (20.6%) had a number of patients lost to follow-up that was higher than their FI, and 15 (44.1%) had a FI of 1–3.

CONCLUSION

The median FI, and hence the reproducibility of interventional radiology RCTs, is low compared to other medical fields, with some having a FI of 1, which should be interrupted cautiously.

Keywords:

Fragility index, interventional, radiology, RCT, reproducibility You

Main points

• The fragility index measures the robustness (or fragility) of the results from a clinical trial that uses dichotomous outcomes, taking into account the number of events in each study arm.

• Several studies analyzed fragility indices for randomized controlled trials (RCTs) published in different medical fields, but this is the first study to perform that analysis for interventional radiology RCTs.

• The median fragility index of those studies was 4.5, with nearly half having a fragility index of 1–3, which is considerably low compared to other fields.

Randomized controlled trials (RCTs) serve as the gold standard and represent the highest level of evidence for determining optimal and effective treatment strategies in evidence-based medicine.¹ Therefore, numerous RCTs related to the interventional radiology field have been performed within the last decade. These trials usually assess the efficacy of an intervention against medical management or another intervention, with a dichotomized primary endpoint and a P value used to compare the outcomes.² However, little attention has been paid to the critical importance of the number of outcome events in each study arm.^3,4,5

The fragility index (FI) measures the robustness (or fragility) of the results from a clinical trial with dichotomous outcomes.⁶ It is defined as the minimum number of patients in one group (usually the study group) whose event status would be required to change from an event to a non-event to change a statistically significant result to a non-significant result. It is considered an important tool in interpreting the results from clinical trials and may provide value in addition to the commonly reported P value, risk reductions, and confidence interval. It also aids in determining when statistical significance in the trial may be lost because of a shift of a few additional events from the experimental group to the control one.¹ The larger the FI, the more robust and reproducible the trial is.⁶ While a low FI indicates that the study hinges on only a few events for statistical significance, Adeeb et al.² proposed a classification system for clinical trials based on the FI, number of patients lost to follow-up, and fragility quotient (FQ). The latter two factors are equally fundamental measures for the robustness of the studies, given that the patients lost to follow-up could potentially change the study’s outcome had they remained in the trial, particularly when the FI is low. The FQ is calculated by dividing the FI by the sample size to provide an adjusted FI value. The proposed classification stratifies trials into three groups: statistically robust (class I), intermediate (class II), and fragile (class III).²

In this study, the authors aim to evaluate the FI for key Interventional Radiology RCTs over the last decade and externally validate Adeeb’s classification.

Methods

A systematic search for published interventional radiology RCTs between January 2010 and December 2022 was performed by two researchers using PubMed. Six main areas were selected: trans-jugular intrahepatic portosystemic shunt (TIPS), trans-arterial chemoembolization (TACE), angioplasty, needle biopsy, nephrostomy, and thrombolysis. The following terms were used to identify the studies: “trans jugular intrahepatic portosystemic shunt”, “trans arterial chemoembolization”, “angioplasty”, “needle biopsy”, “nephrostomy”, “thrombolysis”, “randomized controlled trial”, “interventional radiology”, and “clinical trial”.

Studies that showed no statistically different outcomes between study groups, studies where outcomes were not dichotomized, or studies that compared more than two groups were excluded (Figure 1). The data extracted included the publication year, methodology, primary endpoint, number of cases and events in each group, number of patients lost to follow-up, and the P value.

Given that the study did not involve human or animal subjects, institutional review board approval and patient consent were not required.

Fragility index

An online calculator, http://clincalc.com/Stats/FragilityIndex.aspx, was used to calculate the FI for the included trials.

Statistical analysis

The analysis was carried out using SPSS 26.0 (IBM Corp., Armonk, NY). Given the non-parametric data distribution, the Spearman test was used for correlation analysis between the FI, FQ, and trial characteristics. Numerical variables were presented as median (range), and a comparison was made between groups using the Mann–Whitney U and Kruskal–Wallis non-parametric tests. Statistical significance was defined as P < 0.050.

Results

A total of 34 clinical trials met the inclusion criteria and were included in this study (Supplementary Table 1).^{7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40}The median FI was 4.5 (range 1–68). The median number of patients lost to follow-up was 0 (range 0–43). The number of patients lost to follow-up was higher than the FI in seven (20.6%) studies, and 15 (44.1%) trials had a FI of 1–3. The median FQ was 0.029 (range 0.01–0.34).

There was a negative correlation between the P value and FI (r = -0.78, P < 0.001), while there was a positive correlation between the FI and the sample size (r = 0.46, P = 0.007).

Studies related to angiography had the highest median FI (median 13, range 1–25) followed by TIPS (median 6, range 3–68), while TACE (median 4.5, range 2–14) and angioplasty (median 3, range 1–11) had the lowest FI. However, the difference was not statistically significant (P = 0.793).

Given that Adeeb’s classification was based on the correlation between the number of patients lost to treatment minus the FI on one side and the FQ on another side, the authors externally studied that correlation to validate the classification system. Class I was associated with a significantly higher FQ (median 0.082), followed by class II (median 0.022), and then class III (median 0.015) (P = 0.009).

In this study, 12 (35.3%) RCTs were class I, 17 (50%) were class II, and five (14.7%) were class III, with no significant difference between the clinal study type (P = 0.773).

Discussion

In this study, the median FI for main interventional radiology RCTs was 4.5. In seven (20.6%) studies, the number of patients lost to follow-up was higher than the FI, and 15 (44.1%) had a FI of 1–3. Approximately one-third of the trials fell under class I in the classification system proposed by Adeeb et al.², which the authors externally validated in this study, while 14.7% were considered statistically fragile (class III).

The FI has been reported for a number of medical and surgical RCTs, including cerebrovascular surgery,² critical care,^4,41 nephrology,⁴² hand surgery,⁵ and cardiovascular trials.⁴³ To the authors’ knowledge, this is the first study to evaluate the FI for interventional radiology trials. Given that most RCTs in this field have relatively small sample sizes with limited outcome events, the sole reporting of the P value limits the clinician’s ability to determine the statistical fragility of the result and its clinical usefulness. Therefore, including FI analysis in these trials can help guide the interpretation and implementation of the results. Trials with a low FI indicate that their results are sensitive to even small changes in the data, suggesting that the findings may not be reliable. On the contrary, the results will be robust and less sensitive to changes when the FI is higher. In previous studies, a positive correlation was found between the sample size and the FI, while there was a negative correlation with the P value.^2,3,4 In the authors’ study, there was a negative correlation between the FI and P value but no significant correlation with sample size. The FQ provides more understanding of the stability of the trial’s results and the risk of false positives by standardizing the fragility of a trial to its sample size. A smaller FQ also indicates a less robust study outcome.⁴¹

Adeeb et al.² proposed a classification system that aimed to quantitatively assess the reproducibility of RCTs. RCTs are classified into three classes based on the relationship between the FI, sample size, and number of patients lost to follow-up. Class I studies (statistically robust) are likely to be reproducible and can reliably be incorporated into clinical guidelines. Class III studies (statistically fragile) are more likely to be subject to counterturn by future studies and, therefore, should be interpreted cautiously. Class II studies should be interpreted on an individual basis.² This classification provides a framework for evaluating the robustness and generalizability of trial results, highlighting the potential limitations of small or underpowered trials, and informing decisions about the use of these results in clinical practice and further research.

The median FI related to Interventional Radiology is low compared to other surgical fields, with some having a FI of 1, meaning that if only one patient did not reach the primary outcome in the study group, the results would not be statistically significant. Therefore, one should exercise caution when interpreting the results of those RCTs, especially when the sample size and event numbers are small and there is a high number of patients who were lost to follow-up.

References

Tignanelli CJ, Napolitano LM. The fragility index in randomized clinical trials as a means of optimizing patient care. JAMA Surg. 2019;154(1):74-79.

Adeeb N, Terrell DL, Whipple SG, et al. The reproducibility of cerebrovascular randomized controlled trials. World Neurosurg. 2020;140:e46-e52.

Evaniew N, Files C, Smith C, et al. The fragility of statistically significant findings from randomized trials in spine surgery: a systematic survey. Spine J. 2015;15(10):2188-2197.

Ridgeon EE, Young PJ, Bellomo R, Mucchetti M, Lembo R, Landoni G. The fragility index in multicenter randomized controlled critical care trials. Crit Care Med. 2016;44(7):1278-1284.

Ruzbarsky JJ, Khormaee S, Daluiski A. The fragility index in hand surgery randomized controlled trials. J Hand Surg Am. 2019;44(8):698.e1-698.e7.

Dettori JR, Norvell DC. How fragile are the results of a trial? The fragility index. Global Spine J. 2020;10(7):940-942.

Orloff MJ, Vaida F, Haynes KS, Hye RJ, Isenberg JI, Jinich-Brook H. Randomized controlled trial of emergency transjugular intrahepatic portosystemic shunt versus emergency portacaval shunt treatment of acute bleeding esophageal varices in cirrhosis. J Gastrointest Surg. 2012;16(11):2094-2111.

Orloff MJ, Hye RJ, Wheeler HO, et al. Randomized trials of endoscopic therapy and transjugular intrahepatic portosystemic shunt versus portacaval shunt for emergency and elective treatment of bleeding gastric varices in cirrhosis. Surgery. 2015;157(6):1028-1045.

Sauerbruch T, Mengel M, Dollinger M, et al. Prevention of rebleeding from esophageal varices in patients with cirrhosis receiving small-diameter stents versus hemodynamically controlled medical therapy. Gastroenterology. 2015;149(3):660-668.

Luo X, Wang Z, Tsauo J, Zhou B, Zhang H, Li X. Advanced cirrhosis combined with portal vein thrombosis: a randomized trial of TIPS versus endoscopic band ligation plus propranolol for the prevention of recurrent esophageal variceal bleeding. Radiology. 2015;276(1):286-293.

Wang L, Xiao Z, Yue Z, et al. Efficacy of covered and bare stent in TIPS for cirrhotic portal hypertension: a single-center randomized trial. Sci Rep. 2016;6:21011.

Lv Y, Qi X, He C, et al. Covered TIPS versus endoscopic band ligation plus propranolol for the prevention of variceal rebleeding in cirrhotic patients with portal vein thrombosis: a randomised controlled trial. Gut. 2018;67(12):2156-2168.

Holster IL, Tjwa ET, Moelker A, et al. Covered transjugular intrahepatic portosystemic shunt versus endoscopic therapy + β-blocker for prevention of variceal rebleeding. Hepatology. 2016;63(2):581-589.

Bureau C, Thabut D, Jezequel C, et al. The use of rifaximin in the prevention of overt hepatic encephalopathy after transjugular intrahepatic portosystemic shunt: a randomized controlled trial. Ann Intern Med. 2021;174(5):633-640.

Meng YL, Hu HT, Li HL, et al. The clinical therapeutic effects of arsenic trioxide combined with transcatheter arterial chemoembolization in treating primary liver cancer with pulmonary metastases. Zhonghua Nei Ke Za Zhi. 2012;51(12):971-974.

Goode SD, Cleveland TJ, Gaines PA; STAG trial collaborators. Randomized clinical trial of stents versus angioplasty for the treatment of iliac artery occlusions (STAG trial). Br J Surg. 2013;100(9):1148-1153.

Metussin A, Patanwala I, Cross TJ. Partial hepatectomy vs. transcatheter arterial chemoembolization for resectable multiple hepatocellular carcinoma beyond Milan criteria: a RCT. J Hepatol. 2015;62:747-748.

Lee TY, Lin CC, Chen CY, et al. Combination of transcatheter arterial chemoembolization and interrupted dosing sorafenib improves patient survival in early-intermediate stage hepatocellular carcinoma: a post hoc analysis of the START trial. Medicine (Baltimore). 2017;96(37):e7655.

Hu HT, Yao QJ, Meng YL, et al. Arsenic trioxide intravenous infusion combined with transcatheter arterial chemoembolization for the treatment of hepatocellular carcinoma with pulmonary metastasis: Long-term outcome analysis. J Gastroenterol Hepatol. 2017;32(2):295-300.

Chang G, Xie LL, Li WY, et al. Application of oxaliplatin in combination with epirubicin in transcatheter arterial chemoembolization in the treatment of primary liver carcinoma. J Biol Regul Homeost Agents. 2017;31(2):459-464.

Ogasawara S, Chiba T, Ooka Y, et al. A randomized placebo-controlled trial of prophylactic dexamethasone for transcatheter arterial chemoembolization. Hepatology. 2018;67(2):575-585.

Geng W, Tian X, Fu X, et al. Early routine angioplasty versus selective angioplasty after successful thrombolysis in acute ST-segment elevation myocardial infarction. Coron Artery Dis. 2013;24(3):238-243.

Spreen MI, Martens JM, Hansen BE, et al. Percutaneous transluminal angioplasty and drug-eluting stents for infrapopliteal lesions in critical limb ischemia (PADI) trial. Circ Cardiovasc Interv. 2016;9(2):e002376.

Wang Q, Li K, He C, et al. Angioplasty with versus without routine stent placement for Budd-Chiari syndrome: a randomised controlled trial. Lancet Gastroenterol Hepatol. 2019;4(9):686-697.

Zamboni P, Galeotti R, Salvi F, et al. Effects of venous angioplasty on cerebral lesions in multiple sclerosis: expanded analysis of the brave dreams double-blind, sham-controlled randomized trial. J Endovasc Ther. 2020;27(1):1526602819890110. Erratum in: J Endovasc Ther. 2020;27(1):NP1.

Schroeder H, Werner M, Meyer DR, et al. Low-dose paclitaxel-coated versus uncoated percutaneous transluminal balloon angioplasty for femoropopliteal peripheral artery disease: one-year results of the ILLUMENATE European Randomized Clinical Trial (randomized trial of a novel paclitaxel-coated percutaneous angioplasty balloon). Circulation. 2017;135(23):2227-2236.

Haskal ZJ, Saad TF, Hoggard JG, et al. Prospective, randomized, concurrently-controlled study of a stent graft versus balloon angioplasty for treatment of arteriovenous access graft stenosis: 2-year results of the RENOVA study. J Vasc Interv Radiol. 2016;27(8):1105-1114.

Sharifi M, Bay C, Skrocki L, Rahimi F, Mehdipour M; “MOPETT” Investigators. Moderate pulmonary embolism treated with thrombolysis (from the “MOPETT” Trial). Am J Cardiol. 2013;111(2):273-277.

Xu B, Tu S, Song L, et al. Angiographic quantitative flow ratio-guided coronary intervention (FAVOR III China): a multicentre, randomised, sham-controlled trial. Lancet. 2021;398(10317):2149-2159.

Seeger J, Markovic S, Birkemeyer R, et al. Paclitaxel-coated balloon plus bare-metal stent for de-novo coronary artery disease: final 5-year results of a randomized prospective multicenter trial. Coron Artery Dis. 2016;27(2):84-88.

van Riet PA, Larghi A, Attili F, et al. A multicenter randomized trial comparing a 25-gauge EUS fine-needle aspiration device with a 20-gauge EUS fine-needle biopsy device. Gastrointest Endosc. 2019;89(2):329-339.

Metintas M, Yildirim H, Kaya T, et al. CT scan-guided Abrams’ needle pleural biopsy versus ultrasound-assisted cutting needle pleural biopsy for diagnosis in patients with pleural effusion: a randomized, controlled trial. Respiration. 2016;91(2):156-163.

Cho E, Park CH, Kim TH, et al. A prospective, randomized, multicenter clinical trial comparing 25-gauge and 20-gauge biopsy needles for endoscopic ultrasound-guided sampling of solid pancreatic lesions. Surg Endosc. 2020;34(3):1310-1317.

Wang D, Fu HJ, Xu HX, et al. Comparison of fine needle aspiration and non-aspiration cytology for diagnosis of thyroid nodules: a prospective, randomized, and controlled trial. Clin Hemorheol Microcirc. 2017;66(1):67-81.

Oh D, Kong J, Ko SW, et al. A comparison between 25-gauge and 22-gauge Franseen needles for endoscopic ultrasound-guided sampling of pancreatic and peripancreatic masses: a randomized non-inferiority study. Endoscopy. 2021;53(11):1122-1129.

Laquière A, Lefort C, Maire F, et al. 19 G nitinol needle versus 22 G needle for transduodenal endoscopic ultrasound-guided sampling of pancreatic solid masses: a randomized study. Endoscopy. 2019;51(5):436-443.

Moosanejad N, Firouzian A, Hashemi SA, Bahari M, Fazli M. Comparison of totally tubeless percutaneous nephrolithotomy and standard percutaneous nephrolithotomy for kidney stones: a randomized, clinical trial. Braz J Med Biol Res. 2016;49(4):e4878.

Thomalla G, Simonsen CZ, Boutitie F, et al. MRI-guided thrombolysis for stroke with unknown time of onset. N Engl J Med. 2018;379(7):611-622.

Berglund A, Svensson L, Sjöstrand C, et al. Higher prehospital priority level of stroke improves thrombolysis frequency and time to stroke unit: the Hyper Acute STroke Alarm (HASTA) study. Stroke. 2012;43(10):2666-2670.

Barlinn K, Tsivgoulis G, Barreto AD, et al. Outcomes following sonothrombolysis in severe acute ischemic stroke: subgroup analysis of the CLOTBUST trial. Int J Stroke. 2014;9(8):1006-1010.

Vargas M, Marra A, Buonanno P, Coviello A, Iacovazzo C, Servillo G. Fragility index and fragility quotient in randomized controlled trials on corticosteroids in ARDS due to COVID-19 and non-COVID-19 etiology. J Clin Med. 2021;10(22):5287.

Shochet LN, Kerr PG, Polkinghorne KR. The fragility of significant results underscores the need of larger randomized controlled trials in nephrology. Kidney International. 2017;92(2):1469-1475.

Khan MS, Ochani RK, Shaikh A, et al. Fragility index in cardiovascular randomized controlled trials. Circulation: Cardiovascular Quality and Outcomes. 2019;12(12):e005755.