Like it or not, Artificial intelligence (AI) is coming to our electronic health record (EHR) ecosystem.
This is going to have significant implications for clinical practice, perhaps as big as the shift from paper charts to EHRs.
So, to prepare for this event, I want to just focus on one very small aspect: the size of the data sets which will be extracted from EHRs for the AI deep learning algorithms.
Let’s look at what is happening right now. Here’s a good recent example.
The May 2018 Nature article Scalable and accurate deep learning with electronic health records reports on deep learning methods to help create models to predict hospital admission outcomes including in-hospital mortality, length of stay, and readmission rate.
The study was based on 216,221 patients admitted.
On the upside, these models outperformed the traditional, clinically used prediction models. So far so good. But, what and how much data did they use to create these models? 46,864,534, 221 “data points.”
You read it right. Over 46 BILLION data points!
Let’s dive into this a little more.
For example, they extracted the raw data from something called fast interoperability resources format (FHIR). For those of you wondering, FHIR is becoming one of the standards for EHR interoperability. Heck, even EPIC is joining in the FHIR bandwagon!
Data points may consist of such things as a WBC from a lab record, a vital sign input from the ER, a diagnostic code, or a single word in a nurse’s or radiologist’s note. Of note, these data points may also be sequenced by time, meaning the meta-data from the data points may also be used.
So, at the patient level, what does this mean? Pulling out the old calculator, on average, this particular study is using roughly 217,000 data points per patient. 217,000. Per Pateint. That’s the number I want you to get. That’s we where we are today.
But now I want to go a little further.
This particular study limited itself to using EHR/FHIR data. But look at actually who did the study: UCSF, University of Chicago, Stanford, and GOOGLE. In fact, the two lead authors, Alvin Rajkomar and Eyal Oren, are Google employees.
Why is it significant that this is a Google driven study? Because it makes the next step obvious: don’t just use EHR/FHIR data for health prediction. Instead, why not also add a few hundred thousand more data points per person using their Google search history or Amazon purchase history or Facebook post history too?
This isn’t just conceptual. Just think how much health information can be extracted from your Amazon purchase history or your (or your spouses?) Safeway card grocery shopping history.
We should have no doubt that when Google, Amazon, Apple, Facebook, Microsoft, and IBM get fully and publically into the healthcare space, they are not going to limit their AI data sets to EHRs. They are going to want to use everything.
Right now I am not weighing in on whether or not this is a good thing. (To weigh in on this right now wouldn’t just require me to define “the Good,” it would also require me to reflect on what is the right approach to reach the Good — well beyond the scope of this short piece.)
Rather, I am suggesting the size of the data sets which are going to be extracted from our patient’s digital lives (now, 217,000 data points per patient, but in five years, who knows? 10 million?) is going to be so large it will no longer be a matter of degree — it will be a matter of kind.
Dr. Matthew Rehrl is a physician who has served in a C-Suite advisory role on social media within healthcare for over a decade. His current focus is the ethics of AI in healthcare. He reports no conflict of interest.
He can be found on matthewrehrl.com and @matthewrehrl.