Editorial

Addressing the reporting chasm of artificial intelligence research: the DECIDE-AI reporting guidelines

Editorial

The meteoric rise of artificial intelligence (AI) to the forefront of healthcare innovation has unearthed an array of avenues for surgical researchers to pursue. Applications found throughout the surgical patient pathway mean AI offers new-found support systems for clinical decision-making. Indeed, a growing number of technologies are entering clinical practice,1 with a recent review evaluating randomised controlled trials of diagnostic prediction tools suggests that potential benefits of AI that contemporary healthcare stands to realise.

However, the pathway to translation to the bedside for these technologies is variable. Captured aptly in a recent editorial, there are clear examples of AI technologies already approved for clinical use in the USA, both with and without evaluation through randomised controlled trials.2 This speaks to a wider problem of evaluation in AI innovation, where insufficient reporting in randomised controlled trials prompted the development of several reporting guidelines, examples including the Consolidated Standards of Reporting Trials-AI and Standard Protocol Items: Recommendations for Interventional Trials-AI guidelines advising the minimum reporting standards for clinical trials and protocols, respectively. Similarly, guidance for the initial stages of AI development has been developed, namely, the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD-AI) guidelines for machine learning (ML) prediction models.3

Yet, when one looks at the process of AI translation, from in silico to clinical trial, an evaluation chasm becomes obvious, with guidance lacking on studies reflecting stages 2a and 2b of the IDEAL (Idea, Development, Exploration, Assessment, Long-term study) collaborative. These stages reflect the refinement and preparation for larger clinical studies, which are influenced by factors from the operator including learning curves or training; the health system the technologies enter into or organisational factors such as integration into clinical workflows. Study design features such as patient selection for both training and testing an intervention, and even the AI model itself, are crucial factors to consider prior to large-scale testing.

Vasey and colleagues have identified a gap in the reporting guidelines for evaluating AI-driven decision support systems, producing reporting guidelines to support the evaluation of their early stages. This was achieved through an international, two-round modified Delphi consensus process producing a 17 AI-specific item and 10 generic item reporting guidelines (DECIDE-AI), informing the reporting of early-stage clinical studies of AI-based decision support systems in healthcare.

The systems perspective taken by Vasey et al frame AI decision-support systems as complex interventions.4 This perspective clearly elucidates the importance of understanding of the workflow or clinical process interventions are intended to enter, alongside the evaluation setting of the AI. Reporting of such demonstrates the setting, or even system-specific evaluation in the selected trial which may be important in judging intervention efficacy when applied to the same clinical problem in alternate health systems or settings.

Furthermore, the emulation of aviation or military human factors appraisal is another value of the DECIDE-AI guidelines, particularly as the augmentative nature of AI decision-support systems rely on human-computer interactions. It is evident, for example, in surgery that learning-curves of surgeons influence clinical outcomes,5 meaning complex interventions including AI-based tools must account for this during the evaluation process. Failing to do so may result in intervention failure in larger clinical trials, at cost to researchers and developers, but with perhaps greater cost to trial participants.

Considering these factors, rigorously and systematically, are undoubted means of improving translation of AI interventions from bench-to-bedside. The vitality of which is two-fold; pursuit of evidence-based medicine principles for safe evaluation of a technology, testing and developing them in real-world health systems, coupled with more accurate determination of efficacy and effectiveness, progressing evaluation towards more realistic settings.

One cannot claim perfection when deciding on reporting guidelines, and Vasey and colleagues recognise the known limitations as they achieved consensus from their spectrum of experts. Yet, it is clear that they have provided a robust foundation to foster systematic and transparent reporting to guide the early-stage clinical evaluation of AI technologies. Recognition and improvement of the translation processes’ weaknesses certainly stand to aid AI innovators of tomorrow, with clinical dividends to follow.

Article metrics
Altmetric data not available for this article.
Dimensionsopen-url