The 3am fracture: what AI can and can’t do on-call

The ED doctor is confident. The AI has cleared the radiograph. The patient has a tender anatomical snuffbox and a mechanism consistent with a scaphoid injury. The question you are answering — at 3am, on the phone — is whether you trust the algorithm.

This is not a hypothetical. AI-based fracture detection tools are deployed in emergency departments across the UK and internationally, used by ED doctors and radiologists as part of routine triage. The question of how much weight to give their output is a clinical decision that orthopaedic surgeons are increasingly part of.

What the evidence says

The headline figures are reasonable. A 2022 systematic review and meta-analysis in Radiology examined 42 studies of AI fracture detection across 55,061 images (Kuo et al., 2022). On internal validation, AI achieved pooled sensitivity of 92% and specificity of 91% — comparable to clinicians (91% and 92% respectively). On external validation — the more meaningful test — clinicians slightly outperformed AI (sensitivity 94% vs 91%, specificity 94% vs 91%), though the differences were not statistically significant. Fifty-two percent of included studies were judged to have high risk of bias.

In a real-world emergency department study testing three commercially available AI algorithms, sensitivity across the tools was similar — 90 to 93% (Bousson et al., 2023). Specificity was not. One algorithm achieved 93%; another 70%. A specificity of 70% means roughly one in three normal radiographs is flagged as potentially fractured — a material difference when the tool is influencing who gets called at 3am.

Where AI as an adjunct genuinely helps

Guermazi et al. (2021) tested AI assistance across 24 readers — radiologists, orthopaedists, emergency physicians, and others — in a multi-reader, multi-case study. AI assistance improved sensitivity by 10.4 percentage points (64.8% to 75.2%) without increasing reading time. The gain was significant across most body regions. For shoulder and clavicle and for spine, the improvement did not reach statistical significance.

The implication is straightforward: AI adds most value for less experienced readers, in fracture types it has been well trained on, in body regions where it performs well. It is not a universal upgrade.

The failure mode: automation bias

A 2025 study in European Radiology assessed a commercial AI fracture detection tool in a real-world paediatric emergency department (Ziegner et al., 2025). AI assistance improved overall diagnostic accuracy modestly. But in 2% of cases, readers who had initially identified a fracture correctly changed their answer after seeing the AI output — and were wrong to do so. The algorithm had cleared the radiograph. The reader deferred to it. The fracture had been there all along.

This is automation bias: the tendency to over-rely on an automated system’s output, particularly when it conflicts with your own clinical assessment. The same study found that radial condyle fractures were detected with only 68% sensitivity — a body region where clinical examination and mechanism of injury carry significant weight regardless of what the algorithm says.

Two percent sounds small. On a busy ED overnight with dozens of radiographs reviewed with AI assistance, it is not.

Back to the anatomical snuffbox

The algorithm clearing the radiograph is not the point. The clinical picture is. AI fracture detection improves overall sensitivity for common fracture patterns, particularly for less experienced readers — but it does not examine the patient, does not know the mechanism, and cannot compensate for the third of radial condyle fractures it misses.

When the AI clears a radiograph and clinical examination says otherwise, that is not reassurance. It is a reason for extra scrutiny.

That principle applies whether you are the consultant being called or the registrar making the call.

For background on how to evaluate AI studies and what external validation means in practice, see What do we actually mean when we say AI in orthopaedics?

References

Kuo RYL, Harrison C, Curran TA, et al. Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis. Radiology. 2022;304(1):50–62. https://doi.org/10.1148/radiol.211785
Guermazi A, Tannoury C, Kompel AJ, et al. Improving Radiographic Fracture Recognition Performance and Efficiency Using Artificial Intelligence. Radiology. 2021;302(3):627–636. https://doi.org/10.1148/radiol.210937
Bousson V, Attané G, Benoist N, et al. Artificial Intelligence for Detecting Acute Fractures in Patients Admitted to an Emergency Department: Real-Life Performance of Three Commercial Algorithms. Acad Radiol. 2023;30(10):2118–2139. https://doi.org/10.1016/j.acra.2023.06.016
Ziegner M, Pape J, Lacher M, et al. Real-life benefit of artificial intelligence-based fracture detection in a pediatric emergency department. Eur Radiol. 2025;35(10):5881–5890. https://doi.org/10.1007/s00330-025-11554-9

What the evidence says

Where AI as an adjunct genuinely helps

The failure mode: automation bias

Back to the anatomical snuffbox

References

Leave a Comment Cancel reply