Artificial intelligence (AI) has received a lot of attention in the last few years within medicine, particularly radiology. Already widely used to power technologies such as face detection and driverless cars, its potential applications to the medical field are beginning to be realized. For example, AI can be used for the rapid diagnosis of life-threatening conditions such as stroke, for which every minute is of the essence. It could also serve as a safeguard to prevent medical errors. According to an article published in The BMJ by a team of researchers at Johns Hopkins, more than 250,000 deaths per year in the United States are due to medical errors, making them the third leading cause of death behind heart disease and cancer. AI can find patterns in data that are impossible for humans to detect, potentially leading to new or innovative therapies that could be targeted to an individual patient. AI applications may also soon find their way into the clinic, as the FDA recently permitted marketing of a device that automatically detects diabetic retinopathy.
Despite the promise of AI, however, there are a number of issues that could prevent this technology from reaching its true potential.
Here are just a few of them:
1. The Unrelenting Hype Machine
The Gartner hype cycle is often used to describe expectations for new technologies. After an initial innovation, there is an early “Peak of Inflated Expectations,” followed by a “Trough of Disillusionment” when the initial promises are not rapidly met and many are left disappointed. This is followed by a slower “Slope of Enlightenment” during which actual progress is made, with a final “Plateau of Productivity” representing mainstream adoption of the technology. The Dot-com Bubble of the late 1990s is a classic example. Ultimately, however, medicine is bound by the limits of human physiology, which is completely agnostic to hype. If unrealistic expectations are set, then inevitable disappointment will eventually follow, even if the fundamental technology is sound.
2. The Fear that AI Will Replace Physicians
Most applications of artificial intelligence currently fit under the category of Narrow AI, or AI that is suitable for a specific task. However, the clinical practice of medicine involves synthesizing disparate data sources, from laboratory results to patient and family considerations, a task that is not readily amenable to complete automation. In addition, these algorithms are not able to detect anything that they have not been properly trained to see. Moreover, when these algorithms fail, it is sometimes in bizarre ways due to the quirks in the way they process data, resulting in errors that would be readily apparent to any human observer.
There is no reason why humans and AI applications cannot work together to play off their respective strengths. There is, of course, the legal aspect to this issue and, in particular, the question of who is liable for the final patient outcome, the company that developed the AI or the physician. Finally, relationship that doctors have with their patients cannot simply be replaced by a machine — most people would prefer not to learn about their cancer diagnosis from Alexa.
3. AI Algorithms Designed with Minimal or No Clinical Input
When designing AI algorithms for medical use, asking the right clinical questions is crucial. Otherwise, they might predict something accurately that is of no clinical significance. For instance, a “calcified granuloma” may sound like something scary to a layperson, but it is a finding of no clinical significance. In addition, to train an algorithm appropriately, in-depth knowledge of the nature of the dataset is crucial. Otherwise, clinically relevant data may not be included as inputs to these models and they will not live up to their potential. For instance, if an algorithm is trying to detect hepatocellular carcinoma and it does not include multiple phases of contrast on CT or MRI, it will naturally underperform. However, I still encounter many data science teams that have almost no clinical input.
4. Questionable “Gold Standard” Data for Training These Algorithms
Most machine learning applications currently rely on a process known as “supervised learning.” This requires that there is some “ground truth” or “gold standard” that the algorithm can use for training. The utility of the algorithm is critically dependent upon the quality of this gold standard data. However, it is often of widely varying quality. I have seen companies set their gold standard be the consensus of an expert panel of physicians, and I have also seen it be the interpretation of a single first year resident. There is a large difference in the quality of the data from both sources, and the former is substantially more expensive than the latter. Ultimately, the cost of getting high-quality gold standard data is a major impediment to the development of these algorithms, as physician annotation is highly expensive.
Ultimately, AI could have a profound impact on the practice of medicine, but sufficient safeguards must be taken such that these algorithms are designed with the right clinical questions in mind and with realistic expectations for their success.
Hersh Sagreiya, MD, is a National Cancer Institute Fellow at Stanford University School of Medicine. His research focuses applying machine learning to medical images. He is a 2017–2018 Doximity Fellow.