Google announces new generative AI search capabilities for doctors

Should we be scared or excited about this?

In all seriousness, there is a strong case for using an open source LLM within our systems (that we control + have absolute control over the data) for this use-case.

But letting Google get their hands on all of our health data? That I’m not so sure of.

1 Like

Do you prefer a different vendor? If so, why beyond personal preference? Superficially, this is a good use of generative AI to draw together known information about a patient about a situation and support clinical care, say, through preparing a discharge summary. This process is currently fraught and inaccurate. So, use the tools to reduce stress and, in theory, improve the outcome for all concerned.

“Google Cloud does not access customer data or use it to train models, and the company said the new service is compliant with the Health Insurance Portability and Accountability Act, or HIPAA”
I do hospital medical triage of referral letters, and comprehensive Chronic Pain assessments. I spend hours trying to find relevant patient data all over the EMR. Interesting that the mayo clinic is not rushing in until the system has more evidence of efficacy. It would be interesting to set up a comparative trial of what we currently do and how the Google system performs.

3 Likes

I agree, this will be a game changer and should help to offset our resource shortages and reduce times to see GPs, reduce waitlists etc. If we look back in 20 years and the people saying ‘this will never catch on and has no place in medicine’ are right, I’ll be astounded.

Tools that can rapidly assimilate and assess a large amount of content about a patient and present the most relevant information back to clinicians are the holy grail of health IT. Generative AI and Large Language Models are a fantastic step towards this goal. But…they are just a step…

There was a bit of a flurry of excitement recently when some researchers took the latest GPT-4V vision model and threw some radiology images at it for interpretation. https://arxiv.org/pdf/2309.17421.pdf

It responded with what on first glance looked to be amazing accuracy - except it was dead wrong on one example and appears to have simply made stuff up on another - and that’s the elephant in the room - hallucinations.

There are lots of breathless descriptions of how LLMs can “pass the USMLE” with 80% accuracy. Sounds amazing. But it’s an interesting thought experiment to think about what that 80% means if you take the LLM and pretend it’s a real clinician.

Let’s think about Radiology again and look at errors in this domain. Its no secret that Radiologists miss things (approximately 1% of the time in fact!), but these are typically errors of perception - good search methods and practice can reduce the rate of these errors, and AI is actually super good at helping with this! There are validated image analysis AI models in production with FDA clearance that, for example, help radiologist’s identify lung nodules. Oh to have these deployed across NZ…

Errors of interpretation are far more concerning though i.e. a Radiologist sees something but mis-diagnoses it through a lack of experience or knowledge - in the paper above this is the Jones fracture diagnosis. The Generative AI “sees” the fracture, but confidently mis-diagnoses the actual fracture type #awkward… (see section 4.1 page 33)

The subsequent image analysis on the same page is where things get super-concerning - the hallucination. The AI describes a nodule in the right upper lobe - yet the right upper lobe almost certainly isn’t even visible in the supplied CT slice (let alone a nodule) #superawkward

  • Errors where you miss seeing/perceiving some information = not great but hopefully a fixable thing through improved review process and experience.

  • Errors where you perceive something but attribute it incorrectly = more concerning and needs active addressing

  • Errors where you just make stuff up? That’s malpractice.

Generative AI/LLMs are super exciting, super useful (in some verticals) and super enticing in medicine. But I think they should be evaluated and assessed in the same manner as a new therapeutic. Phase 1, Phase 2 and Phase 3 trials with clear understanding of the risks, benefits and potential adverse side effects of their use.

Humans are hugely susceptible to trusting technology, particularly if it appears to be slick and well presented. Even the concept of a LLM that absorbs notes and summarises them should (in my opinion) raise massive red flags around the potential for end-users to implicitly trust the output because “a computer did it”

3 Likes

We need to be careful with our expectations with tools like this. LLMs are not human intelligences or general intelligences, and their ability to do things that look like they would require understanding is deceptive.

But tools can be useful even if they’re not perfect. As I type this, I’m getting red underlines from the spelling checker. In my case, it’s more of a typing checker, but it highlights a lot of words because I’m not using American spelling. I leave it on though, because it also spots real errors.

2 Likes

I agree with your comments Alastair. Extreme caution will be required with the early versions and that caution should be able to reduce as the models improve.
With the current statistics on physicians getting the problem and/or diagnosis wrong, AI should certainly be of benefit.
I look forward to the day when the AI is good enough that it will be the first port of call for diagnosis and triage, thus fast tracking to the right part of the health system, (and freeing up our GPs to become more specialists).