AI and Peer Review

One of the primary goals of SBM is to discuss how to improve the scientific process as it relates to health care, and how to optimally translate that science into practice. The question often comes down to quality control at various levels of the process, from medical hypothesis to clinical practice. One key mechanism of quality control (in all of science, not just medicine) is peer review, but peer review has come under much criticism recently (see here, here, here, and here). David Gorski recently summed up the situation:

What is becoming clear is that, whatever changes we make in the peer review system, we can’t keep doing what we’re doing any more. Referencing the Churchill quote, at the moment, as flawed as it is, our peer review system is the best system we have for evaluating science. Until someone can come up with an alternative that works at least as well (admittedly not the highest bar in the world), it would be premature to abandon it. That doesn’t mean it can’t be improved. Contrary to Richard Smith’s view, peer review is not a sacred cow, and it doesn’t yet need to be slaughtered.

Peer review is a system used by scientific journals to decide whether or not to publish articles, and to improve articles prior to publication. It is far better in terms of quality control than not peer reviewed. But it is also no guarantee of quality, and it is only as good as the peer-reviewers and the editors who choose them and ultimately decide what to do with their recommendations. The challenge is – there are so many things that can go wrong with scientific research, that the task of weeding it all out is Herculean. Peer reviewers tend to be working (i.e. busy) academics who are not compensated for their time (other than by getting a small amount of academic credit for doing the work). They may be subject experts, but are not necessary experts in statistics, fraud, pseudoscience, and scientific skepticism.

You are probably thinking – shouldn’t they be, if they are scientists? The answer is yes; in an ideal world, every scientist would undergo systematic rigorous training in all these things, but that’s not the world we live in. Also, this is a complex area of study in itself. To quickly review some of the things that can go wrong with scientific research and publication: There is p-hacking, which are subtle ways to alter the outcome of studies that favor a positive outcome. This can be done inadvertently, or researchers may “cut corners” without appreciating the dramatic effect it can have on the statistical outcome. There are also numerous complex statistical errors that can creep into papers. There is a host of methodological flaws and limitations. Bias may creep into the process at multiple levels, such as publication bias and even citation bias. There are also many types of fraud, from plagiarism and excluding data to outright fabrication. Finally, for much scientific research, particularly in medical research or any research dealing with living things, there are ethical considerations.

We also have to keep in mind that there is rarely a perfect study, especially in medicine. There simply aren’t the resources and time available for every single study to be robust and rigorous. Further, there are often trade-offs where it is impossible to have “perfect” data – researchers must choose their priorities (amount of data vs quality of data, for example). In an optimized system, we also don’t want every study to be equally robust. We need preliminary (cheap and quick) studies to test the waters, and then get progressively more rigorous as a hypothesis gets more and more support, building to definitive confirmation or refutation.

We also often have to choose between speed and accuracy. We are faced with that choice now, when it comes to COVID-19 research. We are in the middle of a deadly pandemic, and scientists want to get their data out quickly. Often they make their data available as a preprint online, prior to peer-review. Getting this balance right is also tricky.

In many ways the entire scientific enterprise is getting too complex to optimally manage. No individual can have all the expertise necessary to do everything, and the current systems in place have their own incentives and limitations. There are many concrete ways we can improve the system, but progress is slow – perhaps slower than the rate at which problems are progressing.

In a recent commentary in Nature Biotechnology, Levin et al. propose that artificial intelligence (AI) can be leveraged to address the problem. They focus on some of the recent issues with COVID-19 research, such as the preliminary positive findings for hydroxychloroquine. They write:

Here, we propose a strategy whereby rigorous community and peer review is coupled to the use of artificial intelligence to prioritize research and therapeutic alternatives described in the literature, enabling the community to focus resources on treatments that have undergone appropriate and thorough clinical testing.

But the idea, I think, has merit outside the context of COVID-19. In fact, I would argue that AI is a powerful tool that can be used to improve many systems and institutions that run our increasingly complex civilization. AI software is becoming increasingly powerful. Now do not think about sentient robots – when we refer to AI we mean software that is able to learn and adapt, not software that is self-aware. This is the technology that allows for self-driving cars, for computers that can best chess masters, and systems that can churn their way through tons of data looking for patterns. Think about how these tools can be applied to each of the problems I outlined above.

AI systems could do, for example, a first pass on submitted papers screening for telltale signs of statistical error or even fraud. They can check for signs of plagiarism, fabricated data, and inconsistencies. They can screen for subtle statistical signs of p-hacking. As these systems learn and improve, they can serve as a so-called expert system, helping editors and peer-reviewers focus their attention on potential problem areas.

Further, AI systems can do the sort of deep statistical analysis that people find extremely difficult and counter-intuitive. Of course, a good statistician can do this also, but the ability to quickly and thoroughly do so for every paper would be a game-changer. What I mean by this is that deep statistical analysis can give experts a perspective on what the data mean that scientific papers rarely do. By examining an individual paper it is possible to make statistical statements about how likely it is that the main hypothesis of the paper is actually true. This is something I rarely see. Most scientists rely on the p-value to estimate this, and this is so highly flawed it is best described as wrong.

I see this in medicine all the time – practitioners are mislead by preliminary research because they don’t know all the subtleties of how to put the data into proper statistical context. The bottom line is that making bad decisions about what the research actually says is a rampant problem in medicine, leading to the premature adoption of practices that turn out not to be useful, or even to be harmful.

If, however, it were standard practice to run such articles through a deep-learning AI algorithm that can then come up with an analysis that says “overall probability of effectiveness <10%” or something like that, this would be hugely useful.

Also, AI systems can be used to analyze not just individual papers but the entire scientific literature. This is something that experts will often do, in systematic reviews, but it is extremely challenging and takes a lot of time and effort. That, of course, limits how up-to-date systematic reviews are for any particular medical question. If AI systems could do automated real-time systematic reviews on any scientific question you can come up with, that again would be a game-changer.

The technology is already here to accomplish all of this and more – we just need to specifically adapt it to these specific applications. This could take years, but it is an effort that is worth the time and resources. And to be clear – AI systems do not replace humans. They assist experts by giving them access to vast amounts of analyzed data, and ways of looking at data that would otherwise either not be possible or take too much time and effort to be practical.

There is a clear opportunity for dramatic improvement here, and we should take it.

Author

Steven Novella

Founder and currently Executive Editor of Science-Based Medicine Steven Novella, MD is an academic clinical neurologist at the Yale University School of Medicine. He is also the host and producer of the popular weekly science podcast, The Skeptics’ Guide to the Universe, and the author of the NeuroLogicaBlog, a daily blog that covers news and issues in neuroscience, but also general science, scientific skepticism, philosophy of science, critical thinking, and the intersection of science with the media and society. Dr. Novella also has produced two courses with The Great Courses, and published a book on critical thinking - also called The Skeptics Guide to the Universe.

View all posts

Categories

Tags

Archives

AI and Peer Review

Author

Posted by Steven Novella