There is a somewhat quiet technological revolution underway that may transform the way we practice medicine. Narrow artificial intelligence (AI) tools are increasingly powerful and being developed for a host of medical applications. Now is the time to think carefully about how these tools are developed and deployed.

AI refers to computer software and hardware designed to not just store and crunch data but to analyze it using one of various processes that can find patterns and learn as it goes. Narrow AI includes machine learning, deep learning, and neural networks. It does not include what many people might think of when they hear AI; it is not a sentient or self-aware computer – that is general AI. In the last few decades we have learned that narrow AI is much more capable than we imagined, and can do many of the things we thought it would take a general AI to do. Most famously AI can now defeat the best human chess masters. It’s not even a contest anymore.

Medicine is perhaps a perfect arena for AI, which is good at self-learning how to sift through vast amounts of data in order to make predictions. Right now human doctors do this through extreme training and intuition born of extensive experience, combined with analytical assessments based upon published data and algorithms. It’s hard, and the amount of data going into even a single medical decision can be massive, requiring doctors to increasingly specialize in order to keep up with a progressively narrow field. There are about 2 million scientific articles published each year and growing. AI might help with this issue as well – publishing too many article of too low quality. But even with significant improvement in the peer-review and publishing process, the volume will remain overwhelming.

This is not about AI taking over from doctors, but rather being an indispensable tool for doctors and health professionals. Using narrow AI tools it’s possible to sift through the scientific literature to not only find all the relevant evidence, but determine how predictive that evidence is. AI can find patterns in the data that humans would not notice, or would not intuitively think is relevant, but that may be highly predictive.

This would be both a tremendous opportunity and a huge risk. One risk is for unintended consequences. This goes beyond the “garbage in – garbage out” problem, to creating AI algorithms that reinforce unintended and negative outcomes. This is similar to YouTube algorithms creating conspiracy theorists – not necessarily the goal, but it happened. But this can also be a very good thing if done properly, and even reinforce the science-based practice of medicine. SBM principles need to be baked into the AI algorithms from the beginning. Hopefully, SBM principles will spontaneously emerge from the use of AI to examine the literature. This is essentially what SBM is – looking at all the evidence to see which patterns most predict safety and efficacy.

In fact, one of the aspects of SBM is trying to disabuse our colleagues about common fallacies in evaluating the evidence. Practitioners might put too much weight on preliminary evidence, for example, or improperly interpret P-values. We constantly ask the question – what patterns of scientific research results actually predict that an intervention will ultimately work? What is the actual risk vs. benefit of different approaches to treatment? We tend to focus on where this process breaks down most dramatically, and even deliberately, but the principles apply to the entire system. Most doctors do not understand P-values, and a third of researchers admit to activities which amount to p-hacking (likely without realizing it).

A properly programmed AI could slice through all of this, giving health care providers the bottom line that they actually want and need – what is the risk vs. benefit in percentages? What is the probability that this treatment is going to help this patient? What does this test result actually mean, in terms of how does it affect the probability of different diagnoses? AI is perfect for dealing with predictive value, which is often what we are ultimately trying to get to in clinical practice. But predictive value is not intuitive for humans – we have lots of cognitive biases (like the representativeness heuristic) which get in the way.

Here’s just one example published this week – researchers used AI to examine X-rays to predict COVID-19 outcomes.

Developed by researchers at NYU Grossman School of Medicine, the program used several hundred gigabytes of data gleaned from 5,224 chest X-rays taken from 2,943 seriously ill patients infected with SARS-CoV-2, the virus behind the infections.

Researchers then tested the predictive value of the software tool on 770 chest X-rays from 718 other patients admitted for COVID-19 through the emergency room at NYU Langone hospitals from March 3 to June 28, 2020. The computer program accurately predicted four out of five infected patients who required intensive care and mechanical ventilation and/or died within four days of admission.

It is likely that within the careers of most practicing physicians today AI tools like this will become routine, and soon it will become unthinkable to make clinical decisions without this kind of support. But what are the risks of incorporating AI routinely into clinical decision making?

One is that vested interests will learn how to game the system. Algorithms are powerful, but if you know how they work you can hack them. This might lead to studies being designed specifically to influence the AI algorithm, for example, rather than produce the best data.

Another risk is that these algorithms will incorporate existing biases in medicine, and thereby amplify them. In a recent commentary, Dr. Embi, Indiana University School of Medicine Associate Dean for Informatics and Health Services Research, wrote:

Algorithmic performance changes as it is deployed with different data, different settings and different human-computer interactions. These factors could turn a beneficial tool into one that causes unintended harm, so these algorithms must continually be evaluated to eliminate the inherent and systemic inequities that exist in our healthcare system. Therefore, it’s imperative that we continue to develop tools and capabilities to enable systematic surveillance and vigilance in the development and use of algorithms in healthcare.

It is highly likely that we are near the beginning of a massive shift in how we practice medicine. Now is the time to make sure this powerful tool improves the science-based character of medicine and serves to reduce rather than reinforce existing biases.

Posted by Steven Novella

Founder and currently Executive Editor of Science-Based Medicine Steven Novella, MD is an academic clinical neurologist at the Yale University School of Medicine. He is also the host and producer of the popular weekly science podcast, The Skeptics’ Guide to the Universe, and the author of the NeuroLogicaBlog, a daily blog that covers news and issues in neuroscience, but also general science, scientific skepticism, philosophy of science, critical thinking, and the intersection of science with the media and society. Dr. Novella also has produced two courses with The Great Courses, and published a book on critical thinking - also called The Skeptics Guide to the Universe.