This website uses cookies to improve your experience.

Cookie policy

Guest blog: An investigation of sample size calculations in surgical trials

Authors: Chloe Jacklin, Jeremy N Rodrigues, Joanna Collins, Jonathan Cook, Conrad J Harrison Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
Patient & Doctor at Surgery

We can all recognise the importance of the number of participants in a randomised controlled trial (RCT). Too few participants risks statistical errors, and too many will be overly expensive, and worryingly, unnecessarily expose participants to the risks of research1. To calculate the appropriate number of study participants, trialists must decide a target difference between the two intervention groups that would be considered meaningful. This decision becomes even more challenging when using a patient reported outcome measure (PROM) because, without context, PROM scores are challenging to interpret.

PROMs are defined as “a measurement of any aspect of a patient’s health that comes directly from the patient, without interpretation of the patient’s response by a physician or anyone else”2. Their use has gained popularity and credibility3, not least because it promotes patient-centred care but also because it has gained recognition from governing and advisory bodies2,4. This is further relevant to surgery where new initiatives to foster patient-centred research have been instigated to tackle criticisms of low quality evidence5,6. It is therefore important researchers, clinicians, and funding bodies are aware of the principles of measurement science underlying PROMs and their use in sample size calculations.

The Difference ELicitation in TriAls (DELTA2) guidelines outline the required reporting items for sample size calculations and provide guidance on rigorous target difference determination1. The target difference should be the PROM’s minimal important difference (MID). A popular definition of MID is “the smallest difference in score in the domain of interest which patients perceived as beneficial and which would mandate, in the absence of troublesome side-effects and excessive cost, a change in the patient’s management”7. There are several methods to estimate MIDs which vary in methodological rigour. It is important to be aware that some methods are rather arbitrary and not patient-centred such as using half of the standard deviation (also known as Cohen’s D), and some are superior such as anchoring the PROM to a global change score. The optimal method is to triangulate several good estimates of the MID8–10. Furthermore, the context-specific nature of MIDs must be appreciated because they balance the benefits and disadvantages of an intervention for a given population, treatment, and follow-up duration9. Therefore, an out-of-context MID may compromise a trial’s results.

We used DELTA2 to appraise the sample size calculations in RCTs where the intervention and/or comparator was a surgical intervention, and a PROM was used in the sample size calculation. We looked at trials published in high impact journals from the last 6 years because these are the most cited in their fields and have large international readerships of clinicians, academics and policy makers. A total of 57 were eligible, of which 51 were superiority design.

We found that sample size calculations in high profile surgical RCTs that used a PROM as their primary outcome were suboptimal compared to the contemporary DELTA2 standards. This included missing reporting items, using relatively arbitrary methods to determine the target difference; unclear justification for the target difference; and the application of MIDs calculated in different contexts. Of note, our sample included trials supported by £28 million of UK public research funding that had poor target difference justification.

Our results may reflect the demands for prompt and pragmatic answers to clinical research questions with convenient but suboptimal MIDs, and desire for cost-effective trials by opting for larger target differences.

While we acknowledge the difficult balance between delivering timely answers to clinical questions versus investment in measurement science, there are potential solutions. Recent advances in trial methodology may lead to improvements in target difference setting11–13. For example, adaptive trial designs allow trialists to dynamically refine trial-specific MIDs and adjust sample sizes accordingly. Funding bodies, research ethics committees and journals act as the gateway to research, and could drive improvements in RCT measurement quality by actively promoting alternative trial designs and enforcing careful target difference determination. Rigid budgets and risk aversion of commissioners and funding applicants present potential obstacles; however, this needs to be balanced against the risk to participants and excess cost caused by poor sample size calculations.

Image source: Enlivity 2021 Creative Commons


Part of the charitable activity of the Foundation, BJS Academy is an online educational resource for current and future surgeons.

The Academy is comprised of five distinct sections: Continuing surgical education, Young BJS, Cutting edge, Scientific surgery and Surgical news. Although the majority of this is open access, additional content is available to BJS subscribers and strategic partners.

Discover the Academy
Surgeon Training & Surgeons in Surgery