I just finished reading Richard Harris’ excellent book, Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions. From the title, I was expecting an angry, biased polemic attacking science and scientists. I was very pleasantly surprised. He doesn’t condemn science. He points out problems with the way science is carried out, mostly problems that scientists are already aware of and are trying to correct. And he offers practical solutions.
He explains that the term “rigor mortis” in the title is hyperbole. Rigor in scientific research isn’t dead, but it needs a major jolt of energy. He says scientists have been taking shortcuts around the methods they are supposed to use to avoid fooling themselves. They have often had to choose between maintaining scientific rigor and doing what they perceive as necessary to maintain a career in a hypercompetitive field. That’s a choice no one should have to make. The challenge is not just to make technical fixes in research procedures, but to change the culture.
He cites a study published in Nature in 2012 by C. Glenn Begley on raising the standards for preclinical cancer research. Begley identified 53 potentially groundbreaking studies and tried to reproduce them, working closely with the original researchers. He was only able to reproduce the findings of six (6!) of the studies. The non-reproducible studies had been cited as many as 2,000 times by other researchers as a basis for their own studies. Non-reproducible findings had sent research off in wrong directions. Sometimes huge houses of cards are built on an error, and it can take years for the science to self-correct.
Leonard Freedman, who founded the Global Biological Standards Institute, estimates that:
- 20% of studies have untrustworthy designs:
- 25% use dubious ingredients such as contaminated cells or antibodies that are not as selective as they think.
- 8% involve poor lab technique
- 18% of the time, scientist mishandle their data analysis.
Overall, he found that over half of preclinical research was irreproducible and untrustworthy, representing a waste of $28 billion a year.
Six essential questions
Begley proposes a list of six questions researchers should ask:
- Were experiments performed blinded (did researchers know which cells or animals were in the test group)?
- Were basic experiments repeated?
- We all the results presented?
- Were there positive and negative controls?
- Did scientists make sure they were using valid ingredients?
- Were statistical tests appropriate?
Sources of error
Unconscious bias arises every step of the way in research design, in factors like deciding how many animals to study, deciding which results to include, and analyzing the results. Malcolm Macleod of the University of Edinburgh estimates that when you compound all the sources of bias and error, the actual percent of published studies that are correct may be as low as 15%.
In a 2010 study, Ioannidis and Chavalarias catalogued 235 forms of bias: 235 ways scientists can fool themselves.
Some examples of things that can go wrong in research studies:
- Researchers favored male mice to prevent possible confounding by estrous cycles in female mice; this resulted in deeply skewing some of their results.
- Muscular dystrophy researchers thought they had a valid test of performance, a timed walking test; but when they offered their young subjects a $50 incentive for improved performance, the effect was bigger than has ever been seen for a drug for that disease.
- Mice died when their bedding was switched to a different material.
- Two labs kept getting different results and finally found the cause: glassware was cleaned with acid in one lab and with detergent in the other.
- Two methods routinely used in labs resulted in different findings; it made a difference whether cells were stirred by a rocking device or a system that involved a spinning bar.
- A disease can be cured in a mouse model only to discover that the cure is irrelevant or even deadly for humans. A compound called fialuridine (FIAU) was well tolerated in mice and in a short-term human study; but in a longer study it caused liver failure in half the recipients (7 patients). Five out of the seven died, and the remaining two survived only with liver transplants.
- A drug can fail in mice but work in humans. Statins failed testing in rats, and only ended up being evaluated for humans after a persistent researcher found that it worked in chickens.
- Rats and mice are different: toxicity studies only reach the same conclusion about 60% of the time.
- In mouse studies, the position of the cage or even the sex of the researcher can alter results.
- Many human cancer cell lines used in research have been contaminated; testing shows that they are actually HeLa cells.
- Mycoplasma can infect cells and throw results off.
- Antibodies can be very unreliable; different batches vary, some bind to more than one unique site.
- In one case, there were subtle differences in how the mass spectrometer worked on different days.
- Reliance on a p value of less than 0.05 to establish statistical significance is an arbitrary standard that misleads many people. P-hacking is common. It is extremely easy to find evidence for something that is not true.
- HARKing is a common pitfall, hypothesizing after the results are known.
- The line between exploratory and confirmatory research is often blurred.
These problems are fixable, and many efforts are already underway:
- The NIH has already tightened the criteria for approving a grant.
- Some drug testing is being farmed out to a private company that specializes in running rigorous mouse studies according to the best new standards.
- New human-based models like induced pluripotent stem cells and artificial organs on chips can be substituted for mouse testing.
- The International Cell Line Authentication Committee maintains a list of corrupted cell lines: 438 so far. Researchers can have their cells authenticated before and after a study by an independent commercial testing lab.
- There are initiatives to increase transparency, to encourage data sharing, and to register trials in advance, declaring exactly what hypothesis is being tested and what endpoints will be used. Advance declarations have dropped reported success rates from 57% to 8%.
- There are efforts to formalize the teaching of scientific thinking.
- The federal Office of Research Integrity identifies cases of scientific misconduct, but it is understaffed and underfunded.
- The blog Retraction Watch found about 40 retractions in 2001; now it finds 5-600 a year.
- One journal editor now refuses to accept papers that simply report a correlation between a biomarker and a medical condition.
- When another journal started awarding “openness badges,” the percentage of papers with open data rose from 3% to 38%.
- A whole new field is emerging: meta-research, studying problems in how scientific research is conducted, and identifying solutions.
Conclusion: serious problems, but room for optimism
Scientific medicine has made great strides, but much of today’s research is unreliable. As Harris says, “Biomedicine’s entire culture is in need of serious repair.” He has done a stellar job of identifying the problems, possible solutions, and promising efforts that are already underway. Research jobs and tenure in academia should be awarded based on quality, not quantity of papers published. Studies must be replicated before they are relied on to direct future research. We need better incentives: today, it pays to be first to publish; it doesn’t pay to be right. We can do better. We are trying to do better.
Anyone who does research or reads about research studies will profit from reading this book. It’s well-written and accessible, with short chapters and lots of entertaining vignettes. And it’s a compact 236 pages, much shorter than the last book I reviewed (a 600-plus-page tome on Freud). Highly recommended.