ABSTRACT
PURPOSE
The fragility index (FI) measures the robustness of randomized controlled trials (RCTs). It complements the P value by taking into account the number of outcome events. In this study, the authors measured the FI for major interventional radiology RCTs.
METHODS
Interventional radiology RCTs published between January 2010 and December 2022 relating to trans-jugular intrahepatic portosystemic shunt, trans-arterial chemoembolization, needle biopsy, angiography, angioplasty, thrombolysis, and nephrostomy tube insertion were analyzed to measure the FI and robustness of the studies.
RESULTS
A total of 34 RCTs were included. The median FI of those studies was 4.5 (range 1–68). Seven trials (20.6%) had a number of patients lost to follow-up that was higher than their FI, and 15 (44.1%) had a FI of 1–3.
CONCLUSION
The median FI, and hence the reproducibility of interventional radiology RCTs, is low compared to other medical fields, with some having a FI of 1, which should be interrupted cautiously.
Main points
• The fragility index measures the robustness (or fragility) of the results from a clinical trial that uses dichotomous outcomes, taking into account the number of events in each study arm.
• Several studies analyzed fragility indices for randomized controlled trials (RCTs) published in different medical fields, but this is the first study to perform that analysis for interventional radiology RCTs.
• The median fragility index of those studies was 4.5, with nearly half having a fragility index of 1–3, which is considerably low compared to other fields.
Randomized controlled trials (RCTs) serve as the gold standard and represent the highest level of evidence for determining optimal and effective treatment strategies in evidence-based medicine.1 Therefore, numerous RCTs related to the interventional radiology field have been performed within the last decade. These trials usually assess the efficacy of an intervention against medical management or another intervention, with a dichotomized primary endpoint and a P value used to compare the outcomes.2 However, little attention has been paid to the critical importance of the number of outcome events in each study arm.3,4,5
The fragility index (FI) measures the robustness (or fragility) of the results from a clinical trial with dichotomous outcomes.6 It is defined as the minimum number of patients in one group (usually the study group) whose event status would be required to change from an event to a non-event to change a statistically significant result to a non-significant result. It is considered an important tool in interpreting the results from clinical trials and may provide value in addition to the commonly reported P value, risk reductions, and confidence interval. It also aids in determining when statistical significance in the trial may be lost because of a shift of a few additional events from the experimental group to the control one.1 The larger the FI, the more robust and reproducible the trial is.6 While a low FI indicates that the study hinges on only a few events for statistical significance, Adeeb et al.2 proposed a classification system for clinical trials based on the FI, number of patients lost to follow-up, and fragility quotient (FQ). The latter two factors are equally fundamental measures for the robustness of the studies, given that the patients lost to follow-up could potentially change the study’s outcome had they remained in the trial, particularly when the FI is low. The FQ is calculated by dividing the FI by the sample size to provide an adjusted FI value. The proposed classification stratifies trials into three groups: statistically robust (class I), intermediate (class II), and fragile (class III).2
In this study, the authors aim to evaluate the FI for key Interventional Radiology RCTs over the last decade and externally validate Adeeb’s classification.
Methods
A systematic search for published interventional radiology RCTs between January 2010 and December 2022 was performed by two researchers using PubMed. Six main areas were selected: trans-jugular intrahepatic portosystemic shunt (TIPS), trans-arterial chemoembolization (TACE), angioplasty, needle biopsy, nephrostomy, and thrombolysis. The following terms were used to identify the studies: “trans jugular intrahepatic portosystemic shunt”, “trans arterial chemoembolization”, “angioplasty”, “needle biopsy”, “nephrostomy”, “thrombolysis”, “randomized controlled trial”, “interventional radiology”, and “clinical trial”.
Studies that showed no statistically different outcomes between study groups, studies where outcomes were not dichotomized, or studies that compared more than two groups were excluded (Figure 1). The data extracted included the publication year, methodology, primary endpoint, number of cases and events in each group, number of patients lost to follow-up, and the P value.
Given that the study did not involve human or animal subjects, institutional review board approval and patient consent were not required.
Fragility index
An online calculator, http://clincalc.com/Stats/FragilityIndex.aspx, was used to calculate the FI for the included trials.
Statistical analysis
The analysis was carried out using SPSS 26.0 (IBM Corp., Armonk, NY). Given the non-parametric data distribution, the Spearman test was used for correlation analysis between the FI, FQ, and trial characteristics. Numerical variables were presented as median (range), and a comparison was made between groups using the Mann–Whitney U and Kruskal–Wallis non-parametric tests. Statistical significance was defined as P < 0.050.
Results
A total of 34 clinical trials met the inclusion criteria and were included in this study (Supplementary Table 1).7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40 The median FI was 4.5 (range 1–68). The median number of patients lost to follow-up was 0 (range 0–43). The number of patients lost to follow-up was higher than the FI in seven (20.6%) studies, and 15 (44.1%) trials had a FI of 1–3. The median FQ was 0.029 (range 0.01–0.34).
There was a negative correlation between the P value and FI (r = -0.78, P < 0.001), while there was a positive correlation between the FI and the sample size (r = 0.46, P = 0.007).
Studies related to angiography had the highest median FI (median 13, range 1–25) followed by TIPS (median 6, range 3–68), while TACE (median 4.5, range 2–14) and angioplasty (median 3, range 1–11) had the lowest FI. However, the difference was not statistically significant (P = 0.793).
Given that Adeeb’s classification was based on the correlation between the number of patients lost to treatment minus the FI on one side and the FQ on another side, the authors externally studied that correlation to validate the classification system. Class I was associated with a significantly higher FQ (median 0.082), followed by class II (median 0.022), and then class III (median 0.015) (P = 0.009).
In this study, 12 (35.3%) RCTs were class I, 17 (50%) were class II, and five (14.7%) were class III, with no significant difference between the clinal study type (P = 0.773).
Discussion
In this study, the median FI for main interventional radiology RCTs was 4.5. In seven (20.6%) studies, the number of patients lost to follow-up was higher than the FI, and 15 (44.1%) had a FI of 1–3. Approximately one-third of the trials fell under class I in the classification system proposed by Adeeb et al.2, which the authors externally validated in this study, while 14.7% were considered statistically fragile (class III).
The FI has been reported for a number of medical and surgical RCTs, including cerebrovascular surgery,2 critical care,4,41 nephrology,42 hand surgery,5 and cardiovascular trials.43 To the authors’ knowledge, this is the first study to evaluate the FI for interventional radiology trials. Given that most RCTs in this field have relatively small sample sizes with limited outcome events, the sole reporting of the P value limits the clinician’s ability to determine the statistical fragility of the result and its clinical usefulness. Therefore, including FI analysis in these trials can help guide the interpretation and implementation of the results. Trials with a low FI indicate that their results are sensitive to even small changes in the data, suggesting that the findings may not be reliable. On the contrary, the results will be robust and less sensitive to changes when the FI is higher. In previous studies, a positive correlation was found between the sample size and the FI, while there was a negative correlation with the P value.2,3,4 In the authors’ study, there was a negative correlation between the FI and P value but no significant correlation with sample size. The FQ provides more understanding of the stability of the trial’s results and the risk of false positives by standardizing the fragility of a trial to its sample size. A smaller FQ also indicates a less robust study outcome.41
Adeeb et al.2 proposed a classification system that aimed to quantitatively assess the reproducibility of RCTs. RCTs are classified into three classes based on the relationship between the FI, sample size, and number of patients lost to follow-up. Class I studies (statistically robust) are likely to be reproducible and can reliably be incorporated into clinical guidelines. Class III studies (statistically fragile) are more likely to be subject to counterturn by future studies and, therefore, should be interpreted cautiously. Class II studies should be interpreted on an individual basis.2 This classification provides a framework for evaluating the robustness and generalizability of trial results, highlighting the potential limitations of small or underpowered trials, and informing decisions about the use of these results in clinical practice and further research.
The median FI related to Interventional Radiology is low compared to other surgical fields, with some having a FI of 1, meaning that if only one patient did not reach the primary outcome in the study group, the results would not be statistically significant. Therefore, one should exercise caution when interpreting the results of those RCTs, especially when the sample size and event numbers are small and there is a high number of patients who were lost to follow-up.