Can Patient Data Be Truly ‘De-Identified’ for Research?

A lawsuit against the University of Chicago Medical Center and Google asks whether data can be truly "de-identified when sharing patient data for research purposes. The data included date stamps of when patients checked in and out of the hospital, as well as “copious free-text notes.” The lawsuit alleges, through Google’s “prolific data mining … [the company] is uniquely able to determine the identity of almost every medical record released by the university.” https://www.careersinfosecurity.com/patient-data-be-truly-de-identified-for-research-a-12708

High certainty of de-identification is difficult but there are techniques for doing this better than is implied in the article. Different data elements need to be dealt with in specific ways to maintain the usefulness of data while protecting patient privacy. Raw dates are potentially useful for re-identification, so a technique developed to deal with this is date shifting or date purturbation. This is designed to maintain the ralationship between dates for time-series analysis but to prevent dates being recognised as unique to a known person. There are techniques for dealing with free text as well (can’t remamber the details off the top of my head). This book is a good primer and is based on extensive research in the field.

Anonymizing Health Data - Case Studies and Methods to Get You Started
Luk Arbuckle, Khaled El Emam
Publisher: O’Reilly Media
Release Date: December 2013 Pages: 228
opengraphobject:[360467961847808 : https://www.careersinfosecurity.com/patient-data-be-truly-de-identified-for-research-a-12708 : title=“Can Patient Data Be Truly ‘De-Identified’ for Research?” : description=“A lawsuit against the University of Chicago Medical Center and Google seeking class action status points to the important privacy and security issues raised when”]

Thanks Simon. It will be interesting to see whether the Court grasps these concepts.
opengraphobject:[360467961847808 : https://www.careersinfosecurity.com/patient-data-be-truly-de-identified-for-research-a-12708 : title=“Can Patient Data Be Truly ‘De-Identified’ for Research?” : description=“A lawsuit against the University of Chicago Medical Center and Google seeking class action status points to the important privacy and security issues raised when”]

It seems specifically that Google being the collaborator is the problem, since Google has a unique data collection and capability that would make re-identification easier for them.

Collaborating with organisations that don’t have that power would not have this problem.

(Also, important to emphasise that this is very different from just using Google services where Google do not access your data. I.e. uploading data to Google is not the same providing data so as to collaborate with Google.)
opengraphobject:[360467961847808 : https://www.careersinfosecurity.com/patient-data-be-truly-de-identified-for-research-a-12708 : title=“Can Patient Data Be Truly ‘De-Identified’ for Research?” : description=“A lawsuit against the University of Chicago Medical Center and Google seeking class action status points to the important privacy and security issues raised when”]