Claude vs ChatGPT vs Gemini vs Grok: a surgeon’s honest comparison

I used ChatGPT for about eight months before I switched. I’m not evangelical about it — plenty of people use ChatGPT well — but there were a few things that happened in quick succession that made me want to reassess, and once I looked properly, I didn’t go back. This post is my honest take on the main options. It’s personal. It’s based on what I actually use and why.

If you’re completely new to what these tools are and how they work, BBC Bitesize has a plain-language AI explainer worth reading first.

Why I moved away from ChatGPT

There were two things. The first was practical. When Anthropic launched the Claude desktop app and Cowork, it changed what was possible in a way that the browser-based ChatGPT interface at the time simply didn’t match. Cowork works with files on your computer — rather than uploading a document, having the AI process it in the cloud, and getting back a result, you’re building and editing files locally. That matters for tokens and cost: the AI is adapting something that already exists rather than generating from scratch every time, which is faster, cheaper, and more controllable. For the kind of work I do — adapting letter templates, working through referral structures, building documents I’ll use repeatedly — it was a different experience.

The second was the Pentagon. In early 2026, OpenAI signed a contract with the US Department of Defense that drew significant backlash — including from OpenAI’s own employees. Sam Altman later admitted it looked “opportunistic and sloppy” and the contract was subsequently amended. For most consumers, the practical impact is probably minimal. But for a clinician thinking about the kind of company whose tools you’re embedding into your workflow and the values they represent, it raised questions I found difficult to ignore. Plenty of users felt the same — app store data suggested a significant shift toward Claude around that period.

I’m not making this a moral argument. Use whichever tool works for you. But I think it’s worth knowing the context behind the tools you rely on, and that’s now part of my context for ChatGPT.

Claude

Claude (Anthropic) is what I use. The reasons: it handles long documents well, it writes in a way that sounds human rather than corporate, it reasons carefully when given complex multi-part problems, and the desktop app changed how much I can actually do in a session. The Opus 4 series is the most capable tier. No image generation, no voice mode — those are real absences if you need them.

The thing I come back to most is the Cowork distinction. Tasks that used to involve uploading a document, waiting for a response, then copying the output somewhere else now happen locally and iteratively. It’s a different level of integration with how I actually work.

ChatGPT

ChatGPT (OpenAI) remains the most widely used AI tool and GPT-5.5 is genuinely capable across a broad range of tasks. It has things Claude doesn’t: image generation, voice mode, a larger plugin ecosystem. If you need multimodal tools or you’re already deeply embedded in OpenAI’s ecosystem, the case for staying is real. The writing quality has narrowed the gap with Claude significantly in 2026.

My honest assessment: it’s a strong tool run by a company that has had some public wobbles around its direction. Make your own call on what that means for you.

Grok

I want to be direct about Grok (xAI) because I think the clinical red flags are serious enough to warrant saying so plainly.

In August 2025, a formal call for the removal of Grok’s doctor and therapist modes was published after documented cases of significant misdiagnosis — including a textbook tuberculosis case misidentified as spinal stenosis, and a mammogram of a benign breast cyst misread entirely. These are not edge cases. They are the kind of errors that could cause real harm if a patient acted on them.

Separately, in January 2026, the Washington Post reported that users were prompting Grok to generate sexualised deepfake images of women — including minors — by uploading photographs. Rolling Stone reported that Grok generated three million sexualised images in eleven days, including 23,000 involving children. UK regulators and the EU opened investigations. Australia’s online safety watchdog followed.

Grok may have practical uses — it has real-time access to X posts and can be useful for tracking live information. But the above is enough for me, as a clinician, to not embed it in anything related to my practice or recommend it to colleagues. Others will make different calls. I think the clinical misdiagnosis issue alone is sufficient reason for caution.

Gemini

Gemini (Google) is the tool I’ve used least and feel least qualified to review properly. That’s partly because I’m an Apple user and Gemini integrates most naturally with Android and Google Workspace — if your department runs on Google and your phone is Android, the native integration argument for Gemini is genuinely strong.

The one Google AI product I think is significantly underused in clinical and academic medicine is NotebookLM — a separate tool that lets you upload documents (papers, guidelines, notes) and have a conversation with them, generate summaries, or create audio overviews. It’s free, it’s excellent for literature review and exam preparation, and it deserves its own post. I’ll cover it properly in a future piece.

The short version

Claude is the tool I use and the one this site is built around. ChatGPT is a strong alternative with a broader feature set and some governance questions I find it worth being aware of. Gemini is a legitimate choice particularly for Android and Google Workspace users — and NotebookLM is worth your attention separately. Grok has clinical red flags I’m not willing to overlook.

Pick one and learn it properly. The returns come from depth of use, not breadth of tools.

See also: Should I pay for Claude? Making sense of AI tiers — if you haven’t settled on a subscription yet, start there.

Want help choosing and setting this up for your workflow? Once you’re logged in as a member, you can book a 30 or 60-minute session — we’ll work through exactly which tool fits your situation, get it configured properly, and leave you with something you can actually use. Thirty minutes for a focused Q&A (£75); sixty minutes to build your setup together (£150).

References

NBC News. OpenAI alters deal with Pentagon as critics sound alarm over surveillance. 2026. nbcnews.com
CNBC. OpenAI’s Altman admits defense deal ‘looked opportunistic and sloppy’. March 2026. cnbc.com
Nabil.org. AI Chatbot Grok’s Doctor and Therapist Modes Pose Immediate Danger. August 2025. nabil.org
Washington Post. X users tell Grok to undress women and girls in photos. January 2026. washingtonpost.com
Rolling Stone. Grok Is Creating Sexualized Deepfakes of Celebrities and Children. 2026. rollingstone.com