Applying the scientific method to medicine revolutionized the profession. While this was a long process, there was an inflection point, at least in the US, when the Flexner Report was published in 1910. The report was the basis for aligning US medical education with the standards already more widespread in Europe, and lead to a purge of pseudoscience and poor academic quality from American medical education.
But it seems that medical science, while thriving by some measures, is having some serious problems. The pessimistic view is that medical science is broken, but I think the full picture is far more complicated. We have been trying to unravel and make sense of this situation here at SBM. That is the core of our project – what is wrong with science and medicine and how do we fix it? At the center of the SBM approach is to emphasize the role of scientific plausibility in evaluating all medical claims, something that evidence-based medicine (EBM) deemphasizes, and in some applications even eliminates. But why is plausibility so important?
I think this is part of the natural evolution of scientific medicine. At first, when rigorous scientific methods were applied to medical questions, the questions were big – what causes infections, what are nutritional factors? Big questions had big impacts, and large effect sizes. We essentially have picked all the low hanging scientific fruit. As medical science progressed, the questions have become more subtle, nuanced, and complex. We are chasing smaller and smaller effect sizes. This requires greater and greater scientific rigor, and we have not always kept up.
Another factor is that all aspects of scientific institutions have become more expert at doing science, but this cuts both ways. This expertise can make certain players better at gaming the system. Working the system, in turn, has been driven by specific perverse incentives – academics want funding and to get promoted, institutions want greater reputations, journals want to maximize their impact factor, companies want to maximize their profit.
What does “gaming the system” mean in science? It amounts to getting good at creating the impression of a positive scientific result when there is no actual phenomenon. One aspect of this is P-hacking, tweaking protocols to manufacture statistically significant results, but there are many others. How do we know this is a problem in medicine? One way is because of the “replication crisis”. While there is still debate about exactly how big a problem this is, it is true that not all positive findings in science can be replicated.
The failure to replicate, however, is a feature of science, not a bug. Replication is partly how we determine if a phenomenon is real, which by definition means that some findings won’t replicate. If all positive findings replicated, then we wouldn’t need replication. But the devil is in the details. The real questions are – is the rate of failure to replicate too high, and what is happening with the results in the meantime. The real troubling statistics are that high impact journals are more likely to public studies that don’t replicate, and that such studies are more likely to be cited – even after they fail to replicate. The information ecosystem within the scientific medical literature favors new and exciting findings, not those which are actually true. And this does affect how we practice medicine.
An interesting angle to the replication problem is that people are generally good at predicting which studies are likely to replicate. Even lay people, when presented with a description and the evidence, we able to predict later replication with 67% accuracy, while experts averaged about 72%. Lay people were not technically analyzing methodology – they were basing their judgements on plausibility. Results that seem implausible have a lower probability of replicating – because the results likely do not reflect reality. Plausibility matters, and even non-experts can smell the difference.
Another line of evidence that there is a vulnerability within the institutions of science and medicine is the body of research into highly implausible claims. Interpreting this evidence is a key difference between SBM advocates and proponents of so-called alternative medicine (or more generally, scientific skeptics and believers in the paranormal). Essentially there is a conflict between plausibility and outcome. Homeopathy is a relatively clean example. When I look at studies that purport to show that homeopathy works, I conclude that the methodology of those studies must be flawed (likely due to P-hacking) and would predict that those studies would not replicate. Believers, however, characterize that position as “closed minded” and say that the fault is in our scientific knowledge, not the methods of those studies.
Which interpretation is more likely to be true? That depends on plausibility and methodology. The less plausible the claim, the more likely it is that a positive result is erroneous and won’t replicate. The more rigorous the methodology, the more seriously we should take the results. These two factors have to work in tandem. No matter how rigorous the methodology seems, if the plausibility is low enough it still becomes more likely that the results are spurious.
This can come down to just math. Scientific results are often statistical, and the p-value gives us an estimate of the probability we would be seeing this data if the hypothesis were not true. What are the odds, for example, that we would be seeing a positive result from homeopathic treatments if homeopathy does not work for the indication being studied? Even an extremely impressive p-value of 0.001 means the odds are one in a thousand (assuming perfect methodology and execution of the study, which is not always a deserved assumption). But we have to compare that to the odds that our current understanding of the laws of physics, biology, physiology, and biochemistry are all sufficiently wrong to allow for homeopathy to work.
And this is where the disagreements typically exist – how do we estimate that probability? I find that proponents often casually toss aside the prior findings of science and are comfortable assuming profoundly deep levels of ignorance about how the universe works. Or they simply brush aside concerns over plausibility as being “closed minded” or practicing “scientism”. In fact, we are just practicing science. When it comes to homeopathy, I would place the probability that current scientific knowledge is sufficiently wrong to allow for the possibility that homeopathy works at millions or billions to one – orders of magnitude less likely than even a p-value of 0.001 being a statistical quirk. (I have summarized the reasons why multiple times, but here is a good one.)
The same is true of claims such as ESP or astrology. Acupuncture is more plausible, because at least something physical is happening, but if you define acupuncture as requiring the existence of acupuncture points, the plausibility again plummets.
But it is important to understand that an SBM or skeptical analysis of these claims does not end with an assessment of plausibility. Despite the low plausibility, we still undertake a technical analysis of the literature. Not surprisingly, in every case (so far – but I am always willing to keep looking) the claims with extremely low plausibility all show that same patterns of methodology and results. There is always some potentially fatal flaw with the methodology, and the findings never convincingly replicate. What we never see is a quality of evidence above the threshold where rejecting the null hypothesis is reasonable. This is likely not a coincidence (that low probability claims also tend to have low quality or flawed evidence).
What we never see is rigorous studies with significant positive effects, reasonable effects sizes, and consistent replication. What we do see is gaming of the system, with a range of sophistication, but generally unconvincing evidence, and cries of being unfair for even considering plausibility.