Google Taps AI to Revamp Costly Health-Care Push Marred by Flops

jon_herries · July 30, 2024, 7:11pm

https://www.bloomberg.com/news/features/2024-07-30/google-sees-ai-as-the-key-to-a-health-care-revolution

Interesting story about using gen-ai to write notes, saying it was missing some information (medication allergies). Speaks to the need to check and not assume the outputs are right.

I do wonder though, at what rate do clinicians forget to add important content to the record?

NathanK · July 31, 2024, 4:26am

I’m not sure that we perform better than generative AI in this instance; it would be very good to have a comparison.

My understanding of most evaluations of Gen AI like this compare it with the human written clinical record as a gold standard. In fact, it is a tarnished silver standard - and we miss things, misinterpret what patients say, add bits that we believed we heard (but didn’t), etc.

Perhaps we need a true gold standard (perhaps a panel reviewing the video of consultations several times) so that we can truly compare them with one another.

DrJo · August 2, 2024, 2:58am

There are several obvious problems with current generative AI. This has been looked at quite a lot now. They include confabulation (adding stuff that isn’t there), sycophancy (incorporating what the user wanted into the results, or emphasising such results inappropriately), and lack of reproducibility.

This was discussed recently in JAMA: https://jamanetwork.com/journals/jama/article-abstract/2814609

There’s another evaluation here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11221761/

We now also have a clear demonstration that in a proper clinical setting, they perform poorly:

https://www.nature.com/articles/s41591-024-03097-1

The problems—as we might expect—got worse when they were provided with more information about the actual diagnosis, or with irrelevant information.

A long way to go, yet.

My 2c, Dr Jo.

Alistair · August 4, 2024, 10:13am

@DrJo so so true.

And then, let’s not forget that in a resource-constrained, time-poor, high-pressure environment we should anticipate a generous amount of clinician automation bias….

rradecki · August 6, 2024, 1:46am

Might also consider … in a resource-constrained, time-poor, high-pressure environment we should anticipate a generous amount of clinician forgetfulness or misremembering …

There are two tests worth doing – one of straight accuracy, comparing the transcript of the recording to the content of an AI-generated note … but also one adding in a comparison to the content of a clinician-generated note. It might also be worth monitoring any AI-notewriting for unintended consequences or biases towards additional (or reduced!) downstream testing or referrals, depending on actions inspired by the flowery language an AI might use, even if the content were strictly accurate.