ChatGPT: Digging deeper to find out if my job’s safe
This use of artificial intelligence is at the top of the hype cycle, but before it can make an impact in healthcare, a lot more work is needed.
By now, we’ve probably all heard the quip: "You might be out of a job soon." AI and natural language chatbots have swept the world, and it’s likely to change the face of healthcare.
To realize this, it’s only necessary to visit any kind of health conference at the moment to see stands filled with the next AI evolution and where the industry appears headed.
But what we are evolving to and how we take the best of AI to assist us, remains to be seen.
A battle with ChatGPT
When my boss quipped about whether AI would replace me, I challenged ChatGPT to a duel. Our battleground? Diagnosing type 2 diabetes.
The results were similar but had important distinctions. While ChatGPT offered third-year medical school textbook answers, my response was laced with nuanced human experience.
Despite feeling proud of my efforts, I scrolled through YouTube and TikTok reels and quietly lamented the potential of my clinical role becoming obsolete.
One video, where a clinician gave ChatGPT a clinical scenario, piqued my interest. While the bot’s response had references, they were nowhere to be found in a subsequent search. ChatGPT had fabricated the references to back up its claims that chest pain in a 35-year-old woman taking an oral contraceptive pill was a result of costochondritis.
The correct answer is pulmonary embolism (PE). This could be a potentially life-threatening error if incorrectly screened.
While ChatGPT had done well to establish the most statistically likely cause given the clinician’s terminology, it had failed in using clinical logic. In practice, the most lethal diagnoses are moved higher in the hierarchy of likeliness so they can be eliminated sooner as the cause of a patient’s presentation.
Caveats about the capabilities
Articles about ChatGPT giving more empathetic responses than doctors and passing American medical exams (USMLE 1-3) have swept the Internet. But scratch a little deeper and they all still come with some sizeable caveats.
Let's take the American medical exam, for example:
- • It didn't take an exam paper. Instead, all the questions were samples from the USMLE website.
- • All questions were based on different stages of USMLE exams, where USMLE 1 and 3 questions were combined, which doesn't happen in the USMLE system.
- • Questions with images were removed, and a relative score was produced, giving a false 60 percent pass rate. If you were to include them (how it would be for a human) the score would be closer to 40 percent to 50 percent.
Factoring in the errors
Errors in healthcare are considered untenable, and ChatGPT has a few of them under its belt.
The scope of these errors ranges from an early version recommending self-harm or suicide as a mental health treatment, to making up references for statements.
Humans use language to infer confidence or hesitance regarding answering queries, but ChatGPT seems to lack this communicative skill, leading to dangerous suggestions while giving an impression it is consistently trustworthy.
I’ve lost count of the number of times I’m dealing with a patient and what they are telling me doesn’t match what their non-verbal cues are conveying. Almost all doctors know the experience of needing to push a little harder for the right answer.
The medical community envisions an AI companion that considers diagnoses, finds treatments and stays up-to-date with the latest disease pathology information — think Tony Stark’s ”Jarvis” AI.
In reality, until ChatGPT can understand the nuances of human language, our desire to bend the truth and say things without saying things, it’s going to fall short of an autonomous Jarvis.
But that’s not a shortcoming of ChatGPT – it’s a reflection that ChatGPT was never built to be health-specific.
Heavy lifting ahead
We all know the potential of artificial intelligence chatbots are huge. But we have to invest in lifting their health literacy and having front-line clinicians feed into the development of these tools.
AI needs to learn rules around references and assessing the calibre of research. AI needs to learn how humans convey information, and how humans say things, without saying things.
Increasingly, big tech and consultancies are firing up their PR machines to claim that they will fix healthcare – but unfortunately, it’s not that simple.
Healthcare can’t be administered by untrained professionals. There is no silver bullet. How we build natural language AI to write fun copy, answer everyday questions and run information-based errands is completely different than healthcare-related use cases.
Healthcare takes years of experience and specialist knowledge to be on the frontline – and translating this into system-wide change is a task on a whole other scale that needs healthcare experience.
Not just anyone can translate data and technology into positive health outcomes for people.
We need to be talking about who owns our healthcare companies and what their historic expertise in the sector is.
We need to be asking about how they are taking a patient-centred approach, and how they are involving front-line healthcare practitioners in decision making at a senior level.
No matter how talented your technical staff are, if they’re making key decisions without consultation with experienced frontline healthcare professionals, it will be a recipe for disaster.
So, I won the duel with ChatGPT. But it was never a fair fight, because ChatGPT wasn’t made for healthcare. If I’m going to be out of a job any time soon, that’ll be because people like me spent time building and teaching AI how to be a healthcare professional.
Jamie Ioane, MD, is a clinical subject matter expert for Orion Health.