What AI actually does in orthopaedic clinic: three tools, three honest verdicts

A patient sits in front of you for a hip replacement consultation. You’ve reviewed the X-rays, taken a history, and confirmed the indication. Before you write in the notes, an algorithm has already estimated their 90-day complication risk, flagged potential issues with bone stock, and generated a templating plan from their pre-operative imaging. These tools are not theoretical — they’re deployed in orthopaedic departments across the NHS and private sector right now. The more useful question is not whether to use them but whether they’re doing what vendors claim.

Here are three AI tools already operating in the orthopaedic clinic environment, with an honest assessment of each.


Pre-operative templating software

AI-assisted templating tools — products like TraumaCad (Brainlab) and similar PACS-integrated platforms — use image recognition to identify bone landmarks, estimate implant size, and generate templating overlays from plain radiographs or CT. The appeal is obvious: faster, more reproducible, and less dependent on individual surgeon experience in templating technique.

The evidence supports a meaningful role here. Buchan et al. (2024, Medical Engineering & Physics) evaluated a robotic THA planning software in a 199-patient cohort and found acetabular cup templating accuracy within ±1 implant size in 90.4% of cases, with equivalent performance across software platforms and surgical approaches. Similar accuracy has been reported for AI-based planning tools in total knee arthroplasty, where 3D AI planning has demonstrated advantages over conventional 2D templating for prosthesis sizing and alignment prediction.

The verdict: genuinely useful, particularly for higher-volume units and for trainees developing templating skill. The limitation is that these tools are only as good as the image quality and landmark identification. An adequately exposed digital X-ray with a calibration marker — what the system is built on — is not always what arrives in clinic. Check the input before trusting the output.


Perioperative risk prediction

Several platforms now embed ML-enhanced risk calculators into pre-operative assessment workflows — layering machine learning on top of existing datasets (ACS NSQIP, HES data, local registry data) to predict outcomes like surgical site infection, 30-day readmission, prolonged length of stay, and mortality. In orthopaedics, these are increasingly applied to hip and knee arthroplasty, hip fracture, and elective spine.

The evidence here is more nuanced. Models built on large national datasets show reasonable discrimination at population level — AUCs typically in the 0.7–0.8 range for serious complication prediction in arthroplasty cohorts. But predictive performance degrades when models are applied to different populations, especially where the local patient profile or care pathway differs from the training data.

The verdict: useful for flagging high-risk patients who warrant closer pre-operative optimisation, but not accurate enough to substitute for thorough clinical assessment. The risk is in the framing — a “high risk” score from a machine learning model carries an authority that “this patient looks frail to me” doesn’t, even when the clinical judgement is more accurate. Use these tools to prompt questions, not to close them.


Patient-reported outcome prediction

A newer development — and one that will become increasingly common — is the use of pre-operative data to predict an individual patient’s post-operative PROM result. In hip and knee arthroplasty, centres now use ML models trained on registry data to generate predicted Oxford Hip Score or Oxford Knee Score trajectories at 6 and 12 months. The stated goal is to support shared decision-making: giving patients realistic expectations before they commit to surgery.

The evidence base is growing but individual-level prediction is the central challenge. Using NHS PRO data covering over 130,000 hip and knee replacement procedures, Huber et al. (2019, BMC Medical Informatics and Decision Making; PMID 30621670) found that the best-performing machine learning models (extreme gradient boosting) achieved AUCs of around 0.78 for Oxford Score improvement in hip replacement, and around 0.70 for knee replacement. These represent reasonable population-level discrimination, but an AUC in this range still leaves wide uncertainty around any individual prediction — meaning you cannot reliably tell one patient “you will improve” versus “you will not.”

The verdict: promising concept, but be cautious about the precision with which these tools are presented to patients. “The algorithm predicts you’ll score 42 out of 48 at one year” sounds precise in a way the underlying model doesn’t warrant. Use predicted outcomes to frame the conversation, not to set a number.


The underlying principle

What connects these three tools is a single observation: AI in clinic performs best when it augments a well-constructed clinical decision, and worst when it substitutes for one. The templating tool works when the X-ray is good and the surgeon understands what a templating plan is for. The risk tool works when the clinician understands which risk factors it captures and which it doesn’t. The outcome tool works when the clinician has already had the honest conversation about realistic expectations.

None of these is a replacement for knowing your patient. They’re a second check — and like any second opinion, they’re only useful if you know when to trust them.


References

  1. Buchan M, et al. Accuracy of artificial intelligence-assisted robotic total hip arthroplasty planning software: a consecutive cohort study. Med Eng Phys. 2024. https://doi.org/10.1016/j.medengphy.2024.104105
  2. Huber M, Kurz C, Leidl R. Predicting patient-reported outcomes following hip and knee replacement surgery using supervised machine learning. BMC Med Inform Decis Mak. 2019;19(1):3. PMID 30621670. https://doi.org/10.1186/s12911-018-0731-6

Leave a Comment