Science-based medicine is partly an endeavor to marshal all the available evidence about medical evidence itself. Modern medicine is ultimately about applying scientific evidence to practice, therefore we constantly have to make specific decisions about that evidence. We need to understand what the science says, and not just in the abstract but with regard to practice recommendations and specific decisions with individual patients.
Part of this science-based decision making is to think very carefully about the threshold of evidence required before it is reasonable to adopt a specific medical practice. Of course, this is very context-dependent. If someone has an incurable fatal illness a lower threshold may be appropriate, compared to a self-limiting symptomatic illness. The guiding principle is to consider the entire package of risk/expense/inconvenience to potential benefit. Evidence also has to be put into the context of overall scientific plausibility – the less plausible a potential intervention, the more evidence is needed to conclude that it likely works or even may work.
This is a complex and individual decision, but we want it to be informed by science and evidence as much as possible. And again, that is what SBM is – examining the complex relationship between scientific evidence (from basic science to clinical studies) and clinical practice. Toward that end we need to not only examine the scientific evidence but to also carefully calibrate where the threshold of adoption should be. One way to inform that calibration is to look back at current medical practice and see how often new practices are reversed by later higher-quality evidence (so-called medical reversals).
A number of studies have done just that. A 2013 study by Prasad et.al. looked at 1,344 studies in a high-impact medical journal from 2001 to 2010 and found:
A total of 947 studies (70.5%) had positive findings, whereas 397 (29.5%) reached a negative conclusion. A total of 756 articles addressing a medical practice constituted replacement, 165 were back to the drawing board, 146 were medical reversals, 138 were reaffirmations, and 139 were inconclusive. Of the 363 articles testing standard of care, 146 (40.2%) reversed that practice, whereas 138 (38.0%) reaffirmed it.
That last line in the most important – of the studies that were examining a practice that was already adopted as part of the standard of care, 40.2% were reversed (the evidence showed the practice did not work), while only 38% showed the practice did work (the rest were inconclusive). This does not mean that 38% of what physicians do is not evidence-based or doesn’t work. This was not a review of practice, but a review of published studies. The fact that medical scientists felt it was necessary to study a practice indicates that it was in question in the first place.
But still – 40% seems high. It suggests that perhaps we have been setting the threshold for adoption of new practices too low. What would the ideal number be? That’s a good question. It should not be zero, because that would likely mean we are setting the threshold too high and leaving potentially useful interventions on the table. It should not be 100%, because that would imply a serious problem with preliminary evidence and that we are setting the threshold way too low. Something in the single digits for reversals is likely closer to ideal, considering that these are practices that have been incorporated into the standard of care.
As I said, however, there is another variable here – what is the threshold for conducting research to question an established practice? Maybe researchers are really good at singling out those practices which need to be questioned. Perhaps high-impact journals have a selection-bias toward publishing such reversals.
Other research on reversals also give different outcomes, likely based on the above variables, in addition to different specialties perhaps having a different calculus regarding risk vs benefit. A 2019 study by Prasad and others found:
Through an analysis of more than 3000 randomized controlled trials (RCTs) published in three leading medical journals (the Journal of the American Medical Association, the Lancet, and the New England Journal of Medicine), we have identified 396 medical reversals.
A 2021 article looking at primary care practice found:
We evaluated 408 POEMs (Patient Oriented Evidence that Matters) on RCTs. Of those, 35 (9%; 95% confidence interval [6-12]) were identified as reversed, 359 (88%) were identified as not reversed, and 14 (3%) were indeterminate. On average, this represents about 2 evidence reversals per annum for POEMs about RCTs.
That seems like a more reasonable figure for reversals – 9% – but still, as the authors points out, that leads to a steady number of medical practices that get reversed. In a 2020 article Haslam, Livingston, and Prasad (the same Prasad from the 2013 article above, and several articles on SBM) examined practices that had been reversed. They found:
We recently identified almost 400 medical practices that were used in clinical care before they were tested in well-done randomized controlled trials and subsequently were found to be ineffective or harmful.
Although these practices were implemented because of sound biologic plausibility or encouraging observational data, well done randomized controlled trials have failed to show evidence of effectiveness. These examples raise caution in introducing new clinical interventions into widespread clinical practice without sufficient high-quality evidence demonstrating efficacy.
That supports the conclusion (one which we make frequently here) that a major cause of medical reversals is adopting practice based on preliminary evidence alone. Practitioners often underestimate how profound research bias can be and how unreliable preliminary quality clinical trials can be. In fact, more often than not, encouraging preliminary evidence is not supported by later high-quality clinical trials.
How long are such practices in use before they are found to be wanting? In a 2021 article looking specifically at oncology journals, they found:
The median number of years that the practice had been in use prior to the reversal study was 9 years (range 1–50 years).
That is a long time, and likely represents many patients. The upper end is most interesting – 50 years of clinical use before being reversed.
The totality of evidence suggests that we can and should do better. There is some low-hanging fruit here – stop basing practices on preliminary evidence that has not been confirmed with a reasonably high-quality clinical trial. We can carve out an exception here for compassionate use of experimental treatments in certain clinical contexts, but the vast majority of uses would not qualify for this exception. There is clear evidence, painstakingly reviewed in the pages of SBM and elsewhere, that most preliminary studies in retrospect will be wrong, and that there is a bias toward creating, publishing, and citing positive studies over negative studies. And more recently we know on the back end that many of these practices will be later reversed if they are adopted too soon.
The short answer therefore is that we need to set the threshold higher.