Normal View Dyslexic View

Beyond adoption: why surgical AI must meet Evidence, Ethics, and Equity standards before deployment

Rajarshi Mukherjee

Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Foundation Building, 765 Brownlow Hl, Liverpool L69 7ZX, UK; Liverpool University Hospitals NHS Foundation Trust, Aintree University Hospital, Lower Ln, Fazakerley, Liverpool L9 7AL, UK.

Belinda De Simone

Department of Emergency and General Minimally Invasive Surgery, Level I Trauma Center; Bufalini Hospital, AUSL Romagna, 47921 Cesena, Italy; Department of Theoretical and Applied Sciences, Campus University, Novedrate, 22060 Como, Italy.

Julio Mayol

Hospital Clinico San Carlos, Instituto de Investigación Sanitaria San Carlos, Universidad Complutense de Madrid. 28040, Madrid, Spain.

22 May 2026
https://doi.org/10.58974/bjss/azbc142
Correspondence General
BJSA
BJS Academy
0000-0000
BJS Foundation Limited
London, UK
Correspondence to: Mr. Rajarshi Mukherjee (email: rishim@liverpool.ac.uk)
Institute of Systems, Molecular and Integrative Biology and Liverpool University Hospitals
NHS Foundation Trust
University of Liverpool
Ashton Street
Liverpool
L69 3GE
UK
_____
BJS, https://doi.org/10.1093/bjs/znaf217, published 06 November 2025
_____
Dear Editor
Artificial intelligence (AI) is advancing rapidly across surgical care, yet its translation into routine practice remains inconsistent. A recent BJS article by Manzano Rodriguez et al. highlights critical structural barriers to adoption.1 These insights are timely and important, but they primarily address why surgical AI struggles to scale. A complementary question is equally urgent: under what conditions should surgical AI be deployed at all? We argue that any surgical AI system that fails to meet our ‘Triple E’ criteria of safe surgical AI: Evidence, Ethics, and Equity, should not be deployed.
First, evidence must extend beyond retrospective performance metrics. As highlighted by Manzano Rodriguez et al., current evaluation often relies on heterogeneous datasets and non-standardized metrics, limiting generalizability.1 However, even technically robust models may fail to improve clinical outcomes. Surgical AI operates within highly contextual environments shaped by patient variability, operative technique, and institutional infrastructure. Algorithms developed in narrow datasets frequently degrade when exposed to real-world complexity. Therefore, deployment should require prospective validation demonstrating impact on intraoperative decision-making, complication rates, and patient-centred outcomes. Staged evaluation frameworks, progressing from local observational deployment to multicentre pragmatic trials, offer a realistic pathway.2 Continuous post-deployment monitoring is also essential, as model drift is inevitable in evolving clinical systems.3
Second, ethics must be treated as a prerequisite rather than an adjunct. The integration challenges described in surgical AI are not purely technical but reflect deeper issues of accountability and trust. Surgeons remain responsible for decisions informed by algorithmic outputs, yet many AI systems function as opaque ‘black boxes’. This creates tension between innovation and informed consent.4 Automation bias further compounds risk, particularly in time-critical operative settings where clinicians may defer to algorithmic recommendations. Safe deployment therefore requires explainability, structured training in AI literacy, and clearly defined governance frameworks specifying responsibility across clinicians, institutions, and developers. Without this, adoption risks undermining professional agency and patient trust.
Third, equity remains underdeveloped in current discourse. The structural challenges identified by other researchers, including limited data sharing and reliance on selective datasets, risk embedding bias within AI systems.5 Algorithms trained predominantly in high-resource environments may perform poorly in under-represented populations or settings with different infrastructures. Furthermore, many AI applications depend on advanced imaging, digital ecosystems, and computational resources that are not universally available. Without deliberate mitigation, surgical AI may widen disparities in outcomes. Equitable deployment requires representative training datasets, mandatory subgroup performance reporting, and scalable implementation strategies that extend beyond technologically privileged environments.
Importantly, the barriers to adoption and the conditions for safe deployment are closely linked. Fragmentation of research communities, absence of benchmarking, and limited reproducibility are not merely obstacles to progress but are also threats to safety, accountability, and fairness. Addressing these challenges requires not only collaboration, as proposed,1 but also clear thresholds for deployment. Furthermore, although Agentic AI in Surgery is the next frontier, it is imperative to remember that pathology is complex, surgeons are accountable and patients are vulnerable. Autonomy must be earned, rather than assumed, and prior to deployment, validation and revalidation systems must be in place akin to those adopted for human surgeons.
The Triple E framework should therefore be understood as a minimum standard for surgical AI (Figure 1.). Systems that demonstrate technical performance but lack prospective clinical benefit, transparent governance, or equitable applicability should not progress to routine use. Conversely, embedding these principles across the lifecycle of AI, from design and validation to implementation and monitoring, offers a pathway toward safe and sustainable integration. Surgical AI will not be defined by its computational sophistication alone, but by whether it improves decision-making, preserves professional accountability, and delivers benefit across diverse patient populations.
Figure 1
References
1.Manzano Rodriguez A, Snoek CGM, Schijven MP. Bridging the gap: exposing the hidden challenges towards adoption of artificial intelligence in surgery. BJS. 2025 Nov 6;112. DOI: https://doi.org/10.1093/bjs/znaf217
2.Marcus, H.J., Ramirez, P.T., Khan, D.Z. et al. The IDEAL framework for surgical robotics: development, comparative evaluation and long-term monitoring. Nat Med. 2024; 30: 61–75.
3.Wong A, Sussman JB. Understanding Model Drift and Its Impact on Health Care Policy. JAMA Health Forum. 2025;6:e252724.
4.Embí PJ, Rhew DC, Peterson ED, Pencina MJ. Launching the Trustworthy and Responsible AI Network (TRAIN): A Consortium to Facilitate Safe and Effective AI Adoption. JAMA. 2025;333:1481–1482.
5.Chen RJ, Wang JJ, Williamson DFK, Chen TY, Lipkova J, Lu MY, Sahai S, Mahmood F. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat Biomed Eng. 2023 Jun;7:719-742.
Info
Copied!