OK trying to get back into the habit of bringing to some external forum the stuff crossing my

OK - trying to get back into the habit of bringing to some external forum the stuff crossing my brain …

Here is a pre-publication paper from a former colleague (Gichoya), with big implications. Fundamentally, deep learning models (CNNs) were able to predict the socio-biological construct of “race” - and in fact, did so almost no matter what conditions or objective outcome was used to drive model development. In a first read of a reasonably technical paper - CNNs were trained from reasonably large CXR databases, with the ability to predict self-declared race with an AUC ~+0.95. Subsequent, expansion of training to non CXR led to similar results. Subsequent training of a model to detect other primary objectives still led to a CNN with an AUC of ~0.85 for race. Lots of effort to ascertain the source(s) of race prediction failed to find strong enough associations to drive the CNN prediction of race (e.g. the things like location, labels, types of image quality metrics, etc didn’t explain the race). Further, efforts to consciously manipulate image quality - such as filtering images essentially to the point of white noise - had minimal impact on AUC for race. Essentially, the CNN predicted race, the researchers cannot ascertain how, and efforts to debias the CNN were unsuccessful.

Holy shit.

So, lessons learned - key phrase of the day - “enchanted determinism” (from Crawford, Atlas of AI - good book) - "AI systems are seen as enchanted, beyond the known world, yet deterministic in that they discover patters that can be applied with predictive certainty to everyday life … " This is a great example, by experts, where they have no idea how it does what it does, and pleasantly, rather than accept this and move on - they end with an excellent Conclusion (see paper) with key elements such as “We strongly recommend that all developers, regulators, and users who are involved with medical image analysis consider the use of deep learning models with extreme caution.” 2107.10356.pdf (3.81 MB)

Had a weird conversation about facial recognition the other day. I was talking about how it doesn’t work very well for many people that aren’t Caucasian looking - and they were surprised…

Just started “the alignment problem” - pretty good - has some great anecdotal stories by leading AI researchers (who are non-Caucasian) in visual systems talking about how their own projects won’t register their faces, and they have to use assistants or surrogates to test their own systems.

Hi Matthew - that’s a fascinating paper re: Reading race. Do you think that a “hand-crafted” approach to machine-learning would be a possible solution? In my reading, for Anatomical Pathology, it makes a lot of sense re:interpretability and is more able to align with the current approach pathologists take. Thanks again. Gavin

@GavinH uncertain what you mean by “hand-crafted” - if you mean feature engineering, yes, probably it does reduce the amount of unpredictable (pun intended) model behavior than the CCNs or other more complex, less interpretable algorithmic approaches. That said, widespread evidence exists that there is a need to be careful that you are not selecting surrogates for latent protected attributes (e.g. that some non-race variables have strong correlates to race variables, and thus act as a surrogate to race). I actually don’t know if there exists a literature around pathology and protected attributes - e.g. does race (I am finding a race a frustrating terminology at the moment) have correlation to diagnostic errors or other pre-existing bias. I cannot help but think the main risk would be similar to what is showing up with things like the dermatology diagnostic apps, where under-representation of melanotic skin in training datasets has led to a number of studies showing that the melanoma diagnostic act have quite different performance in darker skin tones.