OK - trying to get back into the habit of bringing to some external forum the stuff crossing my brain …
Here is a pre-publication paper from a former colleague (Gichoya), with big implications. Fundamentally, deep learning models (CNNs) were able to predict the socio-biological construct of “race” - and in fact, did so almost no matter what conditions or objective outcome was used to drive model development. In a first read of a reasonably technical paper - CNNs were trained from reasonably large CXR databases, with the ability to predict self-declared race with an AUC ~+0.95. Subsequent, expansion of training to non CXR led to similar results. Subsequent training of a model to detect other primary objectives still led to a CNN with an AUC of ~0.85 for race. Lots of effort to ascertain the source(s) of race prediction failed to find strong enough associations to drive the CNN prediction of race (e.g. the things like location, labels, types of image quality metrics, etc didn’t explain the race). Further, efforts to consciously manipulate image quality - such as filtering images essentially to the point of white noise - had minimal impact on AUC for race. Essentially, the CNN predicted race, the researchers cannot ascertain how, and efforts to debias the CNN were unsuccessful.
Holy shit.
So, lessons learned - key phrase of the day - “enchanted determinism” (from Crawford, Atlas of AI - good book) - "AI systems are seen as enchanted, beyond the known world, yet deterministic in that they discover patters that can be applied with predictive certainty to everyday life … " This is a great example, by experts, where they have no idea how it does what it does, and pleasantly, rather than accept this and move on - they end with an excellent Conclusion (see paper) with key elements such as “We strongly recommend that all developers, regulators, and users who are involved with medical image analysis consider the use of deep learning models with extreme caution.” 2107.10356.pdf (3.81 MB)