Shares

The big general science news story of last year was, in my opinion, the advances in artificial intelligence (AI) applications. These were new applications of existing technology and were incremental in their advance, but they crossed a line in terms of usability for the public and therefore public awareness. AI applications such as DALLE-2, Midjourney, and ChatGPT seem disruptive.

New technologies tend to go through certain phases in terms of public awareness. First there is the overhyped phase, where mainstream journalism breathlessly reports how these technologies will soon change the world. Two decades ago stem cells were supposed to have cured countless diseases by now. The hype stage is then followed by the disappointment phase as the extreme claims are not realized. Meanwhile, in the background incremental advances continue to be made until practical but limited applications start appearing.

We are definitely in the hype phase with these AI programs. ChatGPT in particular has many people nervous about the implications. As the name implies, this is a sophisticated chatbot trained on a massive amount of data. It can produce cogent responses to prompts, and even write entire essays on a topic. That last part has the education community scrambling, because each response is reconstructed from the training data, and is therefore unique and would not be picked up by software designed to detect plagiarism.

Already the bubble is bursting a little bit. CNET famously tested their own version of ChatGPT to help editors draft articles. They frame this as an experiment, but they did not disclose the nature of the resulting articles, many of which had some egregious errors. They quickly ran into the limitations of the software. But I also think just as quickly we will progress to the post-hype practical application stage (things move fast with the new technology).

I have been using all of the above AI applications to learn what they can and cannot do. They are powerful, but have limitations. They are clearly not at the point where they can replace human content generators, but can be a powerful tool. Artists, for example, will simply use AI art programs to generate digital art. Writers can use ChatGPT as part of their research, they just can’t rely on it to be accurate, thorough, or stylish.

My question is – how long will it be before a ChatGPT type of AI application becomes a routine part of medical practice? I think the answer should be – as soon as possible. There is tremendous potential here to bring the latest scientific information to practitioners at the point of patient care (a stated goal of the Evidence-Based Medicine initiative). An AI like ChatGPT will not be practicing medicine anytime soon, but can function as an expert system – a system used by experts to help them do their job.

The amount of information available to practitioners is massive, and growing. It is estimated that there are about 30,000 medical journals publishing several million articles per year. Doctors already use online search engines like Google to search for medical information they will use in their practice (about 70%). I do – Google is a good search engine and sometimes gets me to the results I need faster than dedicated databases. But of course I also use PubMed and other medical deep sources, and even when I start with Google I always trace back to primary sources.

What we need is a way to sift through millions of references and pull out the relevant information. That sounds like a job for AI. The current version of ChatGPT is only trained on data through 2021. So it is by definition not up-to-date. A medical expert version of ChatGPT would need to be trained on data as up-to-date as possible, and also train on the medical literature as well as expert reviews, published practice standards, and perhaps even unpublished registered articles and conference presentations. The application will also need to be tweaked to provide information in a format most useful to researchers and practitioners, such as providing citations and links to the relevant primary sources.

Answers will also have to be much more detailed, and perhaps even layered. For example, it could give an executive summary, following by a much more detailed answer complete with specific references. Such a system would also have to be constantly monitored for its error rate, and tweaked to minimize errors. Practitioners would also ideally be trained on how to use such a system, including avoiding its weaknesses and pitfalls.

Such a system, however, could be an incredible expert medical system, quickly providing diagnostic and therapeutic information to physicians at the point of care. It can warn about interactions and side effects, remind physicians of best practices, and generally improve the care delivered. Physicians are not always up-to-date on best practices, and many patients do not receive the optimal standard of care. Practice recommendations are also constantly changing, and it is challenging to say the least to keep up.

To get an idea where ChatGPT is right now I gave it the following prompt: “What is the current standard of care for treating cluster headaches?” This is it’s response:

“The standard of care for treating cluster headaches typically involves a combination of preventative and abortive therapies. Preventative treatments aim to reduce the frequency and severity of attacks, and may include medications such as verapamil, sodium valproate, and corticosteroids. Abortive treatments are used to stop an acute attack once it has started, and may include oxygen therapy, triptans, and local anesthetics. In some cases, invasive procedures such as nerve stimulation or nerve blocks may be recommended. It’s important to note that the best treatment plan will vary depending on the individual patient and the severity of their symptoms, and patients should work closely with their healthcare provider to determine the best course of action.”

That is a fine answer, but it is on the level of information for a patient, not a practitioner. It sounds like something you would read on WebMD (and in fact such websites are likely where it got the information from), rather than a practice guideline meant for a physician. This could be a good start, an executive summary, but would need to be followed by much more detailed technical information. Physicians would also need to learn how to focus their questions, or they may need to ask a chain of questions to drill down to the detail they need.

Some early testing using ChatGPT to generate differential diagnoses or treatment guideline with citations had interesting results. In many cases the output sounded good, and appeared authoritative. But on closer inspection, many of the provided reference were fake. The chatbot appears to have reconstructed plausible sounding evidence rather than referencing real evidence. Obviously some fundamental aspects of how ChatGPT functions would need to be changed before it can become a true expert system.

As always there is also a balance between given enough information to inform practice but not so much information that the critical bits are buried and hard to find. But once the basic kinks are worked out, such a system has the potential to significant raise the average standard of care that patients receive, minimize errors, and improve the efficiency and cost-effectiveness of medical practice. The potential benefits are large enough that it is worth a substantial investment. Even if the government, for example, invested a couple billion dollars in developing such a system, it would probably pay for itself with healthcare savings.

One final aspect of such AI systems to consider – these are not thinking machines. They have not understanding. They simply reflect the data on which they are trained. They therefore carry forward all biases, assumptions, and perspectives in the training data. The system also will have its own priorities – when recommending treatments, will it prioritize cost-effectiveness, efficacy at all cost, convenience, minimizing side effects, or medical justice? At the very least it should be transparent in how it prioritizes competing interests. Perhaps there could even be settings to adjust the various priorities – “list options by cost-effectiveness,” for example.

What is undeniable is that the new AI applications are extremely powerful and impressive. But they are not magic boxes – they are not oracles that dispense truths. They are tools which can be used to great benefit or harm, depending upon how thoughtful and careful we choose to be. A medical expert system based on this technology has tremendous potential to transform the practice of medicine. In 20 years we may wonder how we got by without it. But we have to proceed carefully and realistically.

Shares

Author

  • Founder and currently Executive Editor of Science-Based Medicine Steven Novella, MD is an academic clinical neurologist at the Yale University School of Medicine. He is also the host and producer of the popular weekly science podcast, The Skeptics’ Guide to the Universe, and the author of the NeuroLogicaBlog, a daily blog that covers news and issues in neuroscience, but also general science, scientific skepticism, philosophy of science, critical thinking, and the intersection of science with the media and society. Dr. Novella also has produced two courses with The Great Courses, and published a book on critical thinking - also called The Skeptics Guide to the Universe.

    View all posts

Posted by Steven Novella

Founder and currently Executive Editor of Science-Based Medicine Steven Novella, MD is an academic clinical neurologist at the Yale University School of Medicine. He is also the host and producer of the popular weekly science podcast, The Skeptics’ Guide to the Universe, and the author of the NeuroLogicaBlog, a daily blog that covers news and issues in neuroscience, but also general science, scientific skepticism, philosophy of science, critical thinking, and the intersection of science with the media and society. Dr. Novella also has produced two courses with The Great Courses, and published a book on critical thinking - also called The Skeptics Guide to the Universe.