An algorithmic approach to reducing unexplained pain disparities in underserved populations

This paper looks to be attracting plenty of attention. Re-posting here from Nature Medicine - “A machine approach measuring severe pain from osteoarthritis shows reported disparities in knee pain in underserved populations can be reduced vs use of standard radiographic measures of disease severity”. https://www.nature.com/articles/s41591-020-01192-7

OK - many many hours later.
First, pretty certain that this is a paper meant to illustrate the concept that a well designed algorithm, developed and validated with conscious effort to minimize impact of data bias, institutional racism, and other pitfalls of poorly developed (and currently common) AI, can be used to improved the lot of humans (e.g. AI for Good).

I think these authors are careful enough that they would not propose this is ready for prime-time, and I think specifically, would want subsequent re-validation/calibration in any new settings that varied signficantly from the development population. After all, Obermeyer was one of the authors on the Science article that showed that a commonly used resource allocation algorithm, due to non-careful programming and data bias, was systematically biased against offering medical services to African Americans (AA) compared to Caucasians that were otherwise medically equivalent.

I have spent about 9-10 hours reading this (the clinical area is not my expertise), and I walked away with a few key observations:The data for surgical management of knee osteoarthritis compared to non-surgical treatment is remarkably weak (1 small RCT, a number of retrospective uncontrolled reports, and orthopaedists opinion - not that they would have a bias), for pain control and return to function. The diagnostic gold standard of knee OA (a Kellgren-Lawrence grade, KLG, of >/= 2) was developed on 85 Lancashirians (Lancashirites? Lancurians?) in 1954, and was only really validated to ensure Kellgren and Lawrence agreed on features of knee OA 83% of the time. Also, KLG was not designed to correlate to symptoms or to observed articular damage - and it does so poorly. The KOOS (the pain and symptom score for knee OA) has not (that I can find) been validated as having similar test characteristics in AA, but has been validated in other ethicities and shown similar but not the same sensitivity/specificity/reproducibility - so it is a little unclear how reliable KOOS is in this population.This paper took the observation that assuming the KOOS is valid, AA for a given KOOS are less likely to be referred to surgery. This likely arises from the inadequacy of KLG as a valid marker of symptoms (see above) and institutional and professional racism that leads the medical community to say “why do AA patients complain about pain more when their x-ray looks fine” rather than “should an AA in an equivalent amount of pain as a non-AA have worse access to a treatment option that we thinks improves pain” (lots of literature about the former, much less about the latter)In order to correct this institutional racism, rather than relying on the medical community to take the steps to become less racist, and work to capture and report on clinically meaningful outcomes and ensuring equity across whatever outcome metrics are appropriate (race/ethnicity, gender, orientation, socio-economic status …) - the authors have developed a black-box algorithm that will offer an apparently quantized objective prediction of pain based on non-humanly detectable features of a plain x-ray of the knee. By providing this to radiologists, the radiologist may be able to up-grade the imaging grade effectively, which then will work within the current decision-models that are currently structurally racist, to subvert the decision-model and force equity of access to surgery for symptomatic OA. I’m all for the above - I think this a great example of how these tools could be very helpful. However, I think this should be hand-in-hand with frank discussions about institutional racism and concrete plans on how to resolve this. I think just the technical solutions, in the absence of system change, are actually quite risky. Second, from a more cynical space - effectively this validates the idea that the voice of a patient only matters if they are not AA, that rather than having a healthcare system that listens to the patient, and responds to their needs, we replace that voice with a prediction based on a data. For some doctors, the fact that you wouldn’t have to listen to the patient any longer, just predict what they would say, would be highly appealing, but seems just kinda wrong.
PS - thanks @Pieta_Brown for posting this - using it as a case study to walk through how to quantiatively critically appraise a prediction algorithm in the University of Otago Summer School course I am organizing with @Rochelle_Style - Unseen Algorithms.

As a former physio - who has had a bit to do with knee surgery and joint replacements - @matthew.strother you have hit the nail on the head - there was some work done on this by Glyn Elwyn in Wales about 10 years ago.

His primary interest - option grids is an interesting intersection between informed consent and PROMs