A.I. Is Learning to Read Mammograms

https://www.nytimes.com/2020/01/01/health/breast-cancer-mammogram-artificial-intelligence.html

So - as I’m certain others have seen this, as well as the subsequent additional press reports, all of which I have seen have been unrelentingly positive. However, I pulled the original article. Some issues worth noting -
The actual method is not published - with the explanation paraphrased -“this is complicated, and involves several developmental IP approaches, so we describe the concepts, and an expert could figure this out”. When did that become the standard? The data is poorly described - really not at all - other than “representative of UK/US populations” - again, what the actual f***. It is the equivalent of having a table 1 in pharmaceutical papers, and simply saying, "trust us - this population is like yours …"Further - the data is not easy to access. The training set - a UK NHS collection (note, there is NO MENTION OF ANY SORT OF ETHICAL PROCESS TO UTILIZE THE DATA - not surprising as this was published out of the DeepMind-NHS collaboration - which has previous extensive press around the highly questionable ethics of that venture), is at least accessible - if you request. Interestingly, you can see the fields data from the collections site. The only piece of information about the patients that is actually collected in that data set is the age. The “validation” data set, from the US, is actually behind a paywall. I haven’t looked up the cost yet, but that seems problematic for any effort to reproduce or check the results of this work. Further, what is published reveals this is a bizarre data set. It is not “representative” of any normal population undergoing screening - the baseline rate of biopsy proven malignancy is 22% - which would imply more than 1 in 5 women screened for breast cancer have it in this population. Further, the BiRADS classification, which is usually a normal distribution around ~2.5, is >60% BIRADS 5, meaning this is a very dense breast tissue population - e.g. young, or from some strange sub-grouping.

I am not an expert on AI, and Google is. However, I do a lot of medical literature critical appraisal, and if this was published about a drug, there would be an uproar about the obfuscation of data/methods, and the lack of facts. Further, by validating in a population, with a risk of 22% of positive findings, they have selectively weighted the design to bias towards a great sensitivity.

I do think machine vision will be the first major successes in the application of AI in medicine. However, this work establishes an abysmal standard for transparency, design bias, and reproducible science.

Great work Matthew - appreciate your expertise (in critical appraisal :slight_smile:
opengraphobject:[360511876505600 : https://www.nytimes.com/2020/01/01/health/breast-cancer-mammogram-artificial-intelligence.html : title=“A.I. Is Learning to Read Mammograms” : description=“Computers that are trained to recognize patterns and interpret images may outperform humans at finding cancer on X-rays.”]

Very helpful, Matt, thank you
opengraphobject:[360511876505600 : https://www.nytimes.com/2020/01/01/health/breast-cancer-mammogram-artificial-intelligence.html : title=“A.I. Is Learning to Read Mammograms” : description=“Computers that are trained to recognize patterns and interpret images may outperform humans at finding cancer on X-rays.”]

https://www.clinicalkey.com.au/#!/content/playContent/1-s2.0-S1546144020300284

Inconsistent Performance of Deep Learning Models on Mammogram Classification

Take-Home Points [from the paper]

  • Numerous deep learning models for automatic breast lesion classification on mammograms have reported exciting performance surpassing that of expert radiologists.
  • Our results demonstrate high variability in performance across the mammography data sets and models, which indicates that the high performance of deep learning models on one limited data set cannot be readily transferred to unseen external data sets with different data distribution.
  • Radiologists, as consumers of available AI products, should be aware of generalizability issues and ensure algorithms performance are validated at their own institutions before purchasing AI tools—even if FDA cleared.
    opengraphobject:[360511876505600 : https://www.nytimes.com/2020/01/01/health/breast-cancer-mammogram-artificial-intelligence.html : title=“A.I. Is Learning to Read Mammograms” : description=“Computers that are trained to recognize patterns and interpret images may outperform humans at finding cancer on X-rays.”]

Great article - its a good commentary on the state of this specific application, but also seems to be the state of the field. I do wonder about the statement of generalization and validation at their own institution. I suspect it may be the approach requires more of a purchase of an approach, and local re-training and auditing of results within the local context. Less the purchase and plug an play, and more of algorithmic access and maintenance as a service.
opengraphobject:[360511876505600 : https://www.nytimes.com/2020/01/01/health/breast-cancer-mammogram-artificial-intelligence.html : title=“A.I. Is Learning to Read Mammograms” : description=“Computers that are trained to recognize patterns and interpret images may outperform humans at finding cancer on X-rays.”]

Did not generate nearly as much press as the DeepMind/Google, but of marginally better quality, and substantially better description of the actual process. Really good accompanying commentary - the take-home from that is pretty much on par with the @Manish_Kukreja_ADHB posted review - promising tech, weird data used to train and validate, and need for prospective validation against expert radiologists.

The paper itself has some points worth highlighting:the data sets are more diverse - multiple countries, large. However - for a screening population (average TP should be 0.02 or less, this is 0.3) there are a lot of cancers.datasets are poorly described - and not available the algorithm is IP and therefore unavailable for evaluation - other than a demo? A real demo would allow you to reference the algorithm against a well-annotated set of local mammograms, and ascertain performance locally (that would be a great MoH exercise - to create NZ reference data sets to test things like this @jon_herries)It took a whole lot of effort to annotate the training Korean dataset.
uploadedfile:358554804224, uploadedfile:358554796032, opengraphobject:[360511876505600 : https://www.nytimes.com/2020/01/01/health/breast-cancer-mammogram-artificial-intelligence.html : title=“A.I. Is Learning to Read Mammograms” : description=“Computers that are trained to recognize patterns and interpret images may outperform humans at finding cancer on X-rays.”]

So came across this article, with some further digging found another link, wondering if anyone is following this more closely and relating to any applications/research locally?

opengraphobject:[360730805813248 : https://www.healthimaging.com/topics/ai-emerging-technologies/transparent-ai-breast-cancer-shows-its-work : title=“Transparent AI platform shows radiologists its decision-making blueprint for diagnosing breast cancer” : description="A new artificial intelligence platform that detects cancerous lesions on mammograms has been developed to mimic the interpretation processes of radiologists. "], opengraphobject:[360730806026240 : https://pratt.duke.edu/about/news/ai-breast-cancer : title=“The First AI Breast Cancer Sleuth That Shows Its Work” : description=“New AI for mammography scans aims to aid rather than replace human decision-making”]

not local - but a decent meta-analysis of the commercial products, essentially highlighting that this is promising for marginal gains in accuracy, but that the quality of the studies supporting clinical implementation remains quite weak.
uploadedfile:1242929512448