Interesting story about using gen-ai to write notes, saying it was missing some information (medication allergies). Speaks to the need to check and not assume the outputs are right.
I do wonder though, at what rate do clinicians forget to add important content to the record?
I’m not sure that we perform better than generative AI in this instance; it would be very good to have a comparison.
My understanding of most evaluations of Gen AI like this compare it with the human written clinical record as a gold standard. In fact, it is a tarnished silver standard - and we miss things, misinterpret what patients say, add bits that we believed we heard (but didn’t), etc.
Perhaps we need a true gold standard (perhaps a panel reviewing the video of consultations several times) so that we can truly compare them with one another.
There are several obvious problems with current generative AI. This has been looked at quite a lot now. They include confabulation (adding stuff that isn’t there), sycophancy (incorporating what the user wanted into the results, or emphasising such results inappropriately), and lack of reproducibility.
The problems—as we might expect—got worse when they were provided with more information about the actual diagnosis, or with irrelevant information.
And then, let’s not forget that in a resource-constrained, time-poor, high-pressure environment we should anticipate a generous amount of clinician automation bias….
Might also consider … in a resource-constrained, time-poor, high-pressure environment we should anticipate a generous amount of clinician forgetfulness or misremembering …
There are two tests worth doing – one of straight accuracy, comparing the transcript of the recording to the content of an AI-generated note … but also one adding in a comparison to the content of a clinician-generated note. It might also be worth monitoring any AI-notewriting for unintended consequences or biases towards additional (or reduced!) downstream testing or referrals, depending on actions inspired by the flowery language an AI might use, even if the content were strictly accurate.