ABSTRACT
PURPOSE
To determine how radiology, nuclear medicine, and medical imaging journals encourage and mandate the use of reporting guidelines for artificial intelligence (AI) in their author and reviewer instructions.
METHODS
The primary source of journal information and associated citation data used was the Journal Citation Reports (June 2023 release for 2022 citation data; Clarivate Analytics, UK). The first-and second-quartile journals indexed in the Science Citation Index Expanded and the Emerging Sources Citation Index were included. The author and reviewer instructions were evaluated by two independent readers, followed by an additional reader for consensus, with the assistance of automatic annotation. Encouragement and submission requirements were systematically analyzed. The reporting guidelines were grouped as AI-specific, related to modeling, and unrelated to modeling.
RESULTS
Out of 102 journals, 98 were included in this study, and all of them had author instructions. Only five journals (5%) encouraged the authors to follow AI-specific reporting guidelines. Among these, three required a filled-out checklist. Reviewer instructions were found in 16 journals (16%), among which one journal (6%) encouraged the reviewers to follow AI-specific reporting guidelines without submission requirements. The proportions of author and reviewer encouragement for AI-specific reporting guidelines were statistically significantly lower compared with those for other types of guidelines (P < 0.05 for all).
CONCLUSION
The findings indicate that AI-specific guidelines are not commonly encouraged and mandated (i.e., requiring a filled-out checklist) by these journals, compared with guidelines related to modeling and unrelated to modeling, leaving vast space for improvement. This meta-research study hopes to contribute to the awareness of the imaging community for AI reporting guidelines and ignite large-scale group efforts by all stakeholders, making AI research less wasteful.
CLINICAL SIGNIFICANCE
This meta-research highlights the need for improved encouragement of AI-specific guidelines in radiology, nuclear medicine, and medical imaging journals. This can potentially foster greater awareness among the AI community and motivate various stakeholders to collaborate to promote more efficient and responsible AI research reporting practices.
Main points
• Based on author and reviewer instructions, artificial intelligence (AI)-specific guidelines are not commonly encouraged, and they are not mandated for submission as filled-out checklists by radiology, nuclear medicine, and medical imaging journals.
• The proportions of author and reviewer encouragements for AI-specific reporting guidelines were statistically significantly lower compared with those for other types of guidelines.
• The collaboration of all stakeholders, including guideline developers, journal managers, editors, reviewers, authors, and funders, is needed to further encourage these guidelines to make AI research less wasteful.
Poor or suboptimal reporting of medical research is regarded as a significant and widespread issue that contributes to the waste of scarce and valuable resources invested in research projects.1-5 For such studies, readers cannot assess the validity of the method relative to existing knowledge, and thus the reliability and reproducibility of the findings.6 This hinders the clinical translation of promising research findings7 and their comparability with other publications for evidence synthesis or meta-analysis.8 The adherence to consensus-based reporting standards (i.e., reporting guidelines) is one of the principal methods for reducing the risk of poor reporting. To promote this, vast projects, like Enhancing the QUAlity and Transparency Of health Research (EQUATOR) network, were started, and several reporting guidelines were developed and published in the literature.9, 10 Typically, these guidelines take the form of online or offline checklists, flowcharts, or explicit texts that instruct authors on how to report their research. Several studies researched the effectiveness of adhering to reporting guidelines in various study types. They found that adherence is associated with improved manuscript quality in peer review,11 favorable reviewer ratings and editorial decisions,12 higher citation counts and opportunity to be published in journals with a higher impact factor,13 and improved completeness and quality of the research.14-22
Similar to healthcare literature, medical artificial intelligence (AI) research faces poor or suboptimal reporting issues. With the massive growth of healthcare literature using AI, including medical imaging,23 the need for complete and structured reporting of prognostic and diagnostic studies that use machine learning algorithms or models has increased. An expanding body of research indicates that AI studies frequently fall short of expected reporting standards,24, 25 lacking sufficient details on modeling and its evaluation, and failing to adequately address potential sources of bias.26-32 Multiple specific guidelines relevant to AI studies have been developed to address these issues.25, 33-41 Examples of these guidelines include Checklist for AI in Medical Imaging (CLAIM),42, 43 Fairness, Universality, Traceability, Usability, Robustness, and Explainability-AI (FUTURE-AI),44 Minimum Information about Clinical AI Modelling (MI-CLAIM),45 CheckList for EvaluAtion of Radiomics research (CLEAR),46 and METhodological RadiomICs Score (METRICS).47 In addition, as a continuation of previous efforts, several guidelines are currently under development, such as Standards for the Reporting of Diagnostic Accuracy Studies-AI (STARD-AI) for AI-centered diagnostic test accuracy studies and Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis-AI (TRIPOD-AI) for those related to diagnostic models.48, 49 The most widely recognized AI guidelines and ones currently under development can be found in the following seminal papers.50, 51
The availability of reporting guidelines and checklists has not yet resolved the problem of inadequate reporting. While editorial guidance advocating for transparent reporting is widespread and well-intentioned, authors frequently overlook or fail to adhere to these guidelines.52-57 In a very recent citation analysis of an AI checklist on medical imaging and a meta-research on radiomics, claims regarding the use of checklists and quality scoring tools for self-reporting (i.e., reporting with filling checklists by study authors) have been supported.26, 32 Journals can significantly impact the quality of reporting by encouraging or mandating responsible reporting practices, such as the use of reporting guidelines and checklists in their author and reviewer instructions.58, 59 However, research on the encouragement of AI reporting guidelines by journals specialized in radiology, nuclear medicine, and medical imaging is scarce.60 Investigating this issue could yield valuable insights to foster higher-quality research within these journals.
This meta-research study aims to determine how these journals encourage and mandate (i.e., requiring a filled-out checklist) the use of AI reporting guidelines in their author and reviewer instructions by comparing reporting guidelines that are specific to AI, related to modeling, and unrelated to modeling.
Methods
Figure 1 presents the key study steps of this meta-research.
Dataset
The primary source of journals and associated citation data used was the Journal Citation Reports (June 2023 release for 2022 citation data; Clarivate Analytics, UK). This report was based on data obtained from the Web of Science (WoS) (Clarivate Analytics, UK).
Journals indexed in the WoS category, radiology, nuclear medicine, and medical imaging, that met the following criteria were included in this study: inclusion in the Science Citation Index Expanded (SCIE) or Emerging Sources Citation Index (ESCI) and placement within the first quartile (Q1; top 25% of journals in the list) or second quartile (Q2; journals in the top 25%–50% group) based on the 2022 Journal Impact Factor. This analysis excluded journals that had a limited scope, specifically those that focused solely on review articles (i.e., not publishing original research articles), as these journals were not expected to publish articles using AI reporting guidelines.
Two readers, each in their third year of radiology residency and with prior experience conducting systematic reviews on reporting quality in AI or radiomics, accessed the author and reviewer instructions from the journals’ websites and saved them as PDF files. The task was distributed evenly among the readers, and they also reviewed each other’s resulting files. All author and reviewer instructions were accessed between September 4 and 7, 2023. In the case of multiple instructions, the most up-to-date version was selected.
To mitigate errors during the assessment of instructions, a custom Python script based on the PyMuPDF package was used to automatically annotate certain terms within the PDF documents. The terms covered AI, machine learning, reporting, guidelines, checklists, and their specific names or acronyms. The code and exact terms can be accessed at https://github.com/radiomic/PDFhighlighter.
Evaluation of author and reviewer instructions
The author and reviewer instructions that were automatically annotated by the script were evaluated by the same readers who downloaded the instructions. All evaluations underwent a meticulous review process overseen by an additional reader possessing 8 years of expertise as a radiology specialist, complemented by over 5 years of research experience in machine learning, radiomics, and systematic reviews. Final decisions were reached by consensus among all readers.
The collected data primarily fell into two categories: encouragement of authors or reviewers and the presence of submission requirements for filled-out checklists in case of encouragement. When evaluating the encouragement, to elicit a positive evaluation from readers, it was imperative to explicitly state the name of the reporting guideline or make a direct reference to it. In addition, encouragement was defined as any sort of mention of specific guidelines. For instance, if authors and reviewers are recommended for adherence, referral, or usage of the guidelines, even if not explicitly intended for integration into their workflow, it was considered encouragement. The inclusion of general references to the central source or hub of guidelines or checklists, such as the EQUATOR network website, was not regarded as a specific encouragement in this work. To fulfill the submission requirement (i.e., mandating), this study sought a clear indication that the filled-out checklist would be uploaded to the submission system as an integral part of both the manuscript and peer review processes. The submission systems were only investigated when the submission requirements were unclear in the instructions. Checklists without an associated publication in a journal (i.e., checklists as part of journal instructions without a digital object identifier) were not considered as a reporting guideline.
Three types of reporting guidelines were analyzed as follows: i) AI-specific reporting guidelines; ii) those related to modeling (e.g., diagnostic or prognostic modeling; may or may not be associated with AI or machine learning); and iii) those unrelated to modeling. AI-specific reporting guidelines and those related to modeling included those specified in two recent seminal articles.50, 51 For AI-specific guidelines (e.g., CLAIM, Consolidated Standards of Reporting Trials for AI (CONSORT-AI), Standard Protocol Items: Recommendations for Interventional Trials for AI (SPIRIT-AI), FUTURE-AI, MI-CLAIM), this study referred to the publication of Klontzas et al.50, which did not limit its scope to a specific data type. For guidelines related to modeling, including AI-specific ones (e.g. TRIPOD), this study referred to the paper of Klement and El Emam51, which primarily focused on structured data. Due to the potential omission of relevant reporting guidelines in these papers, this study refrained from confining its criteria to those listed in the aforementioned articles.
Statistical analysis
Statistical analysis was performed using Jamovi (version 2.2.5). The majority of the findings were presented through descriptive statistics, wherein percentages were rounded to the nearest whole number. The inter-reader agreement analysis of the first two readers was conducted using Cohen’s kappa or percentage agreement, as appropriate. The following grading system was used to interpret Cohen’s kappa: kappa ≤0.00, no; 0.00< kappa ≤0.20, slight; 0.20< kappa ≤0.40, fair; 0.40< kappa ≤0.60, moderate; 0.60< kappa ≤0.80, substantial; 0.80< kappa ≤1, almost perfect agreement. Comparison of the distribution of quantitative variables was conducted using either the Student’s t-test or the Mann–Whitney U test, depending on the statistical normality of the data. The chi-square test or Fisher’s exact test was employed to assess differences in the distribution of categorical variables across various citation variables between subjects. Furthermore, McNemar’s test was used for the same purpose within subjects, and the continuity correction was also applied. A P value of <0.05 was considered statistically gnificant.
Results
Baseline characteristics of journals
Out of 102 Q1 and Q2 radiology, nuclear medicine, and medical imaging journals indexed in SCIE and ESCI databases, 98 were included in this study. Four journals were excluded because they published only review articles. Of the journals included, 66 were from SCIE (Q1/Q2, 32/34), with a median 2022 impact factor of 3.9 (interquartile range: 2.4). The remaining 32 journals were from ESCI (Q1/Q2, 16/16), with a median 2022 impact factor of 2.25 (interquartile range: 1.9).
For all 98 journals, instructions specific to authors were found. However, specific instructions for reviewers or referees were found for 16 journals only (16%).
Analysis of author instructions
Table 1 summarizes the encouragement of authors to use reporting guidelines that are specific to AI, related to modeling, and unrelated to modeling, as well as the requirement of submission for these reporting guidelines.
Considering all 98 journals, only five journals (5%) encouraged the authors to follow AI-specific reporting guidelines. Table 2 presents the AI-specific guidelines recommended in these journals: CLAIM (n = 3), Proposed Requirements for Cardiovascular Imaging-Related Machine Learning Evaluation (PRIME) (n = 1), and Checklist for AI in Medical Physics (CLAMP) (n = 1).42, 61, 62 Of these, three (60%) required a filled-out checklist along with the submission.
In total, 30 journals (31% of 98) endorsed at least one reporting guideline related to modeling, including both general modeling guidelines and AI-specific ones: TRIPOD (n = 26), along with the three aforementioned AI reporting guidelines, namely CLAIM, PRIME, and CLAMP.42, 61, 62, 63 One journal encouraged two modeling-related guidelines (TRIPOD and CLAIM). Of the 30 journals, only four (13%) required a filled-out checklist along with the submission. Furthermore, only one of the journals, Ultrasound in Obstetrics and Gynecology, encouraged TRIPOD and mandated a filled-out checklist.
A total of 75 journals (77% of 98) encouraged at least one guideline unrelated to modeling. The frequency of the most well-known guidelines in these categories is as follows: CONSORT (n = 61), Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (n = 51), Animal Research: Reporting of In Vivo Experiments (ARRIVE) (n = 45), STARD (n = 44), and Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) (n = 42).64, 65, 66, 67, 68 Of these journals, 36 (48%) required a filled-out checklist along with the submission.
The level of encouragement for authors, both with and without submission requirements, regarding the naming of reporting guidelines, is summarized in Figure 2, alongside a comparison with that of reviewers.
Statistically significant differences were observed in the proportions of author encouragement among pairwise comparisons of AI-specific reporting guidelines, those related to modeling, and those unrelated to modeling (P < 0.001 for all). Notably, the encouragement level for guidelines unrelated to modeling was consistently higher across all pairs.
There were no statistically significant differences in the distribution of author encouragement status concerning the journal index (i.e., SCIE vs. ESCI) and quartile (i.e., Q1 vs. Q2) (P > 0.05 for all).
Regarding the encouragement of reporting guidelines related to modeling in general, including AI-specific ones, as well as those unrelated to modeling, the inter-rater reliability analysis yielded almost perfect agreement, with Cohen’s kappa values ranging between 0.916 and 0.950.
Analysis of reviewer instructions
Table 1 summarizes the encouragement of reviewers to use reporting guidelines that are specific to AI, related to modeling, and unrelated to modeling, as well as the requirement of submission for these reporting guidelines.
Of the 16 journals that had instructions for reviewers, only one (6%), European Radiology, encouraged the reviewers to follow an AI-specific reporting guideline (CLAIM), which can also be regarded as a modeling-related guideline, without a filled-out checklist along with the submission of peer review.42 The primary purpose was, however, to check whether the authors provided the checklist.
Regarding the guidelines that are not related to modeling, six journals (38% of 16) encouraged the reviewers to follow at least one of those. The journals most frequently recommended CONSORT (n = 4) and PRISMA (n = 4) without a filled-out checklist along with the submission of peer review.67, 68
The summary of reviewer encouragement, both with and without submission requirements, regarding the naming of reporting guidelines, is depicted in Figure 2, alongside a comparison with that of the authors.
There was a statistically significant difference in the proportion of reviewer encouragement between AI-specific or modeling-related reporting guidelines and those unrelated to modeling (P < 0.025), with the latter being the higher.
There were no statistically significant differences in the distribution of reviewer encouragement status against the journal index (i.e., SCIE vs. ESCI) and quartile (i.e., Q1 vs. Q2) (P > 0.05 for all).
For reviewers, the encouragement of reporting guidelines related to modeling in general, including AI-specific ones, as well as those not related to modeling, resulted in high inter-rater reliability, with percentage agreement values ranging between 79% and 93%.
Discussion
Overview
This meta-research investigated how radiology, nuclear medicine, and medical imaging journals encourage and mandate (i.e., requiring a filled-out checklist) the use of AI reporting guidelines in their author and reviewer instructions. The results were presented by comparing reporting guidelines that are specific to AI, related to modeling, and unrelated to modeling. It was found that only a very small number of journals encouraged (5%, 5/98) and mandated (3%, 3/98) the use of AI reporting guidelines (i.e., CLAIM, PRIME, and CLAMP) for authors. In addition, only one journal (6% of 16 available reviewer instructions) encouraged the reviewers to follow AI reporting guidelines (i.e., CLAIM), without any requirement of submission. Encouragement and the mandated use of AI-specific guidelines and those related to modeling in the journals were generally lower compared with those unrelated to modeling.
Previous related works
There is only one recent closely related study to this research in which the endorsement of AI reporting guidelines in radiology journals has been systematically analyzed.60 In their seminal study, Zhong et al.60 investigated the endorsement of 15 general reporting guidelines and 10 AI reporting guidelines. Of the 117 SCIE journals included, the authors found that CLAIM (1.7%, 2/117) was the only and the most implemented AI reporting guideline, while the other nine AI reporting guidelines were not mentioned. This study found that five (5%) out of the 98 journals encouraged AI-specific guidelines. The disparity in rates can be attributed to two methodological issues. First, the journals differed in their index sources. Second, our study encompassed half of the journals indexed in SCIE and ESCI (i.e., Q1–Q2). In contrast, Zhong et al.60 exclusively included all SCIE journals. Furthermore, the AI reporting guidelines considered for these works were different. This study referenced two prior works and imposed no additional restrictions on their use, provided that they were in the form of a publication (i.e., not a custom checklist that appears on the instructions of journals).50, 51 The authors of the prior investigation restricted their assessment to 10 AI reporting guidelines. Furthermore, both studies reached the same conclusion that the endorsement or encouragement to follow AI reporting guidelines in these journals was remarkably low. Their main findings were complementary and mutually reinforcing.
Given the scarcity of literature on the encouragement of AI reporting guidelines in radiology, nuclear medicine, and medical imaging journals, it would be beneficial to discuss studies that are not specifically pertinent to AI but are nevertheless extremely relevant to the encouragement of and mandating reporting guidelines. In a cross-sectional study, Malički et al.69 analyzed a representative sample of journal instructions for authors across multiple scientific fields, including health sciences. The instructions of 13% of journals suggested the use of reporting guidelines, while only 2% mandated its use. In addition, the authors discovered that journals in the health or life sciences, as well as those published by prominent publishers, were more likely to include reporting guidelines or standards in their author instructions. In a different study, Agha et al.70 investigated the impact of the mandatory implementation of reporting guidelines on the quality of reporting in a surgical journal. Compliance with STROBE, CONSORT, and PRISMA dramatically improved after the policy implementation. The authors observed that implementing a policy demanding the submission of a completed reporting checklist for observational research, randomized controlled trials, and systematic reviews can increase compliance. In addition, they recommended similar approaches for various journals and study types. In another seminal study, Hirst and Altman focused on the encouragement of reviewers to utilize reporting guidelines for 116 health research publications.59 They discovered that 41 (35%) of the journals offered reviewers with online instructions. In addition, they revealed that nearly half of the online instructions referred to these tools without providing clear instructions on how to use them.
Potential reasons for low rates
Considering the relevant works above and the present study, it is evident that journals do not encourage and mandate AI reporting guidelines frequently. The potential causes can only be speculated because their analysis falls outside the scope of this study. The editorial team of the journals may wrongly presume that researchers are aware of these fundamental aspects of rigorous and transparent reporting and that authors are entirely responsible for implementing them, not the journals. The journals may also be hesitant to incorporate appropriate reporting practices through reporting guidelines, and they may be unwilling to address scientific misconduct and correct publication errors.71, 72, 73 The editors may also not want to unintentionally overburden the authors with too many instructions. Even if journals encourage good reporting practices, researchers may be resistant to fundamental change. Furthermore, despite the validity of these tools, journals may not agree on the importance of reporting guidelines and may be hesitant to recommend their usage in the absence of convincing proof of their effectiveness.
What are the following steps?
In light of the outstanding and exponential growth of AI research on medical imaging over the past decade,23 it is necessary to promote the highest-quality research. It would be advantageous to conduct additional research to define the effectiveness of AI reporting guidelines. Such research will help persuade journals to encourage and mandate them. Hence, there is a need for further assessment of AI reporting guidelines to determine their optimal utilization. This assessment should consider whether they should be incorporated into the study design, applied during ongoing research, utilized solely for reporting purposes post-study completion, or implemented at the request of journals, among other potential considerations. Enhancing our understanding of the factors that influence the dissemination and implementation of these tools and strategies is crucial for improving their efficacy and promoting their broader adoption. Future research should investigate the obstacles journals might experience when adopting such policy changes in their journals, as well as how automated tools could minimize their workload while guaranteeing adherence to these reporting guidelines. Furthermore, radiology, nuclear medicine, and medical imaging journals may collaborate to improve reporting standards for research. These group initiatives should also be supported by scientific organizations, universities, institutions, societies, and funding agencies. This would make it more difficult for authors receiving negative reviews due to inadequate reporting to choose journals with more flexible reporting policies. This could enhance the overall reporting quality of the scientific literature. In certain areas of medical research, such as rehabilitation and disability, the journals established such collaborations.74 As of 2014, 28 prominent rehabilitation and disability journals have joined a group to require the adherence to reporting guidelines to increase the quality of research reporting, not just inside their journal but also within their field of medicine and research. They jointly published an editorial, announcing their agreement and urging authors to adhere to appropriate EQUATOR reporting guidelines when preparing articles for submission. They also requested reviewers to utilize reporting guidelines when evaluating submissions.74 A similar group effort is crucial to improve the overall reporting quality of AI research in radiology, nuclear medicine, and medical imaging journals.
Limitations
This study has a few limitations. First, it assumed that instructions are the sole location where reporting guidelines that are encouraged or mandated can be found. However, some of the requirements editors put on authors and reviewers may not be necessarily outlined in the instructions. For instance, the submission systems of all the journals were not thoroughly analyzed to check whether they encouraged or requested the use of guidelines during the submission and/or review processes. It was presumed that this was not common practice. Nonetheless, their submission systems were only investigated when the submission requirements were not clear in the instructions. Second, only Q1 and Q2 SCIE and ESCI journals indexed in the WoS were included due to their well-known high standards for indexing. Therefore, it is unlikely to represent the editorial standards of all journals. To diversify the journal characteristics, Q1 and Q2 ESCI journals were included instead of Q3 and Q4 SCIE journals. However, achieving a perfect representation of journals in terms of diversity should not be a major concern in an exploratory study focusing on a new area of reporting guidelines. Third, while downloading the journal instructions, they were double-checked for accuracy. Due to the complex and multi-layered design of certain journal websites, some parts of the instructions may have been omitted. Additionally, this study aimed to evaluate the automatically annotated content of the instructions through independent readings by two readers, with consensus reached through consultation with a third reader. This study may have missed any reporting guidelines that were recommended or deemed necessary in the submission. However, the impact of missing instructions and their content analysis will likely be minor. Finally, the instructions were downloaded over a brief time frame (between September 4 and 7, 2023). If journals had improved their instructions after this period, these changes would not have been reflected in the results.
In conclusion, this meta-research study provides an overview of instructions for authors and peer reviewers across radiology, nuclear medicine, and medical imaging journals. It specifically examines the encouragement of AI-specific reporting guidelines and their submission requirements, comparing them with guidelines related to modeling and those unrelated to modeling. However, the findings indicate that these AI-specific guidelines are not commonly encouraged and mandated (i.e., requiring a filled-out checklist) by these journals, compared with other guidelines. To further encourage the use of these tools, all stakeholders, including developers, journal managers, editors, reviewers, authors, and funders, are required to collaborate. Given their position at the forefront of AI, if more of these journals enforce or encourage responsible reporting through guidelines, the value of articles and AI research may increase and become less wasteful.
Conflict of interest disclosure
Burak Koçak, MD, is Section Editor in Diagnostic and Interventional Radiology. He had no involvement in the peer-review of this article and had no access to information regarding its peer-review. Other authors have nothing to disclose.