If there’s one thing about quacks, it’s that they are profoundly hostile to science. Actually, they have a seriously mixed up view of science in that they hate it because it doesn’t support what they believe. Yet at the same time they very much crave the imprimatur that science provides. When science tells them they are wrong, they therefore often try to attack the scientific method itself or claim that they are the true scientists. We see this behavior not just in quackery but any time scientific findings collide with entrenched belief systems, for example, medicine, evolution, anthropogenic global warming, and many others. So it was not surprising that a rant I saw a few weeks ago by a well-known supporter of pseudoscience who blogs under the pseudonym of Vox Day caught my interest. Basically, he saw a news report about an article in Nature condemning the quality of current preclinical research. From it, he draws exactly the wrong conclusions about what this article means for medical science:
Fascinating. That’s an 88.6 percent unreliability rate for landmark, gold-standard science. Imagine how bad it is in the stuff that is only peer-reviewed and isn’t even theoretically replicable, like evolutionary biology. Keep that figure in mind the next time some secularist is claiming that we should structure society around scientific technocracy; they are arguing for the foundation of society upon something that has a reliability rate of 11 percent.
Now, I’ve noted previously that atheists often attempt to compare ideal science with real theology and noted that in a fair comparison, ideal theology trumps ideal science. But as we gather more evidence about the true reliability of science, it is becoming increasingly obvious that real theology also trumps real science. The selling point of science is supposed to be its replicability… so what is the value of science that cannot be repeated?
No, a problem with science as it is carried out by scientists in the real world doesn’t mean that religion is true or that a crank like Vox is somehow the “real” intellectual defender of science. Later, Vox doubles down on his misunderstanding by trying to argue that the problem in this article means that science is not, in fact, “self-correcting.” This is, of course, nonsense in that the very article Vox is touting is an example of science trying to correct itself. Be that at it may, none of this is surprising, given that Vox has demonstrated considerable crank magnetism, being antivaccine, anti-evolution, an anthropogenic global warming denialist, and just in general anti-science, but he’s not alone. Quackery supporters of all stripes are jumping on the bandwagon to imply that this study somehow “proves” that the scientific basis of medicine is invalid. A writer at Mike Adams’ wretched hive of scum and quackery, NaturalNews.com, crows:
Begley says he cannot publish the names of the studies whose findings are false. But since it is now apparent that the vast majority of them are invalid, it only follows that the vast majority of modern approaches to cancer treatment are also invalid.
But does this study show this? I must admit that it was a topic of conversation at the recent AACR meeting, given that the article was published shortly before the meeting. It’s also been a topic of e-mail conversations and debates at my very own institution. But do the findings reported in this article mean that the scientific basis of cancer treatment is so off-base that quackery of the sort championed by Mike Adams is a viable alternative or that science-based medicine is irrevocably broken?
Not so fast there, pardner…
A systemic problem with preclinical research? Maybe. Maybe not.
One of the most difficult aspects of science to convey to the general public about science-based medicine (and science in general) is just how messy it is. Scientists know that early reports in the peer-reviewed literature are by their very nature tentative and have a high probability of ultimately being found to be incorrect. Unfortunately, that is not science as it is imbibed by the public. Fed by too-trite tales of simple linear progressions from observation to theory to observation to better theory taught in school, as well as media portrayals of scientists as finding answers fast, most people seem to think that science is staid, predictable, and able to generate results virtually on demand. This sort of impression is fed even by shows that I kind of like for their ability to excite people about science, for instance CSI: Crime Scene Investigation and all of its offspring and many imitators. These shows portray beautiful people wearing beautiful pristine lab coats back lit in beautiful labs using perfectly styled multicolored Eppendorf tubes doing various assays and getting answers in minutes that normally take hours, days, or sometimes weeks. Often these assays are all done over a backing soundtrack consisting of classic rock or newer (but still relatively safe) “alternative” rock. And that’s just for applied science, in which no truly new ground is broken and no new discoveries are made.
Real scientists know that cutting edge (or even not-so-cutting edge) scientific and medical research isn’t like that at all. It’s tentative. It might well be wrong. It might even on occasion be spectacularly wrong. But even results that are later found to be wrong are potentially valuable.
Sometimes moviemakers and TV producers get it close to right in showing how difficult science is. For example the HBO movie Something The Lord Made showed just how difficult it could be to take a scientifically plausible hypothesis and turn it into a treatment. In most movies, TV shows, and popular writings, the retrospectoscope makes it seem as though what we know now flowed obviously from the observations of scientific giants. Meanwhile, the news media pounces on each new press release describing new studies as though each was a breakthrough, even though the vast majority of new studies, even seemingly interesting ones, fade into obscurity, to be replaced by the next new “breakthrough.”
In the real world of science, however, things are, as I said, messy. What amazes me is how two scientists, one of whom I respect, can fall prey to amazement when they point out just how messy science is. I’m referring to a commentary that appeared in Nature three weeks ago by C. Glenn Begley, a consultant for Amgen, and Lee M. Ellis, a cancer surgeon at the University of Texas M.D. Anderson Cancer Center. The article was entitled, unimaginatively enough, Drug development: Raise standards for preclinical cancer research. This article is simultaneously an indictment of preclinical research for cancer and a proposal for working to correct the problems identified. It is also simultaneously disturbing, reassuring, and, unfortunately, more than a little misguided.
Before I get into the article, let me just expound a bit (or pontificate or bloviate, depending on what you think of my opinionated writing) about preclinical research. Preclinical research is, by definition, preclinical. It’s the groundwork, the preliminary research, that needs to be done to determine the plausibility and feasibility of a new treatment before testing it out in humans. As such, preclinical research encompasses basic research and translational research and can include biochemical, cell culture, and animal experiments. Depending on the nature of the problem and proposed treatment, it could also include chemistry, engineering, and surgical research.
Now here’s the pontification and bloviation. These days, everybody touts “translational” research, meaning research that is designed to have its results translated into human treatments. It’s darned near impossible these days to get a pure basic science project funded by the NIH; there has to be a translational angle. Often this leads basic scientists to find rather—shall we say?—creative ways of selling their research as potentially having a rapid clinical application, even though they know and reviewers know that such an application could be a decade away. Indeed, if we are to believe John Ioannidis, the median time from idea to completion of large scale clinical trials needed to approve a new treatment based on that idea is on the order of one to two decades. Moreover, as I’ve said many times before, translational research will grind to a halt if there isn’t a robust pipeline of basic science research to provide hypotheses and new biological understandings to test in more “practical” trials. A robust pipeline is necessary because the vast majority of discoveries that look promising in terms of resulting in a therapy will not pan out. That is the nature of science, after all. Many leads are identified; few end up being a treatment.
Not surprisingly, this nature of science seems to be what concerns Begley and Ellis. They begin by pointing out:
Sadly, clinical trials in oncology have the highest failure rate compared with other therapeutic areas. Given the high unmet need in oncology, it is understandable that barriers to clinical development may be lower than for other disease areas, and a larger number of drugs with suboptimal preclinical validation will enter oncology trials. However, this low success rate is not sustainable or acceptable, and investigators must reassess their approach to translating discovery research into greater clinical success and impact.
Of course, some of the reason that clinical trials in oncology have a high failure rate is no doubt due to the high difficulty of the disease (actually many diseases) being tackled. As I’ve pointed out time and time again, cancer is very, very complicated and very, very hard. Given that challenge, as frustrating as it is, it is probably not surprising that only around 5% of agents found to have anticancer activity in preclinical experiments go on to demonstrate sufficient efficacy in phase III clinical trials to earn licensing for sale and use. That is compared to approximately 20% for cardiovascular disease. Of course, cardiovascular drugs are targeted at cells that are nowhere near as messed up as cancer cells, and another study cited by Begley and Ellis suggests that between 20-25% of important preclinical results can’t be reproduced in pharmaceutical company laboratories with sufficient rigor to go forward. Even so, being scientists, we want to improve the process. To improve the process, however, we need to know where the process fails.
To try to do this, Begley and Ellis looked at 53 “landmark” publications in cancer. Begley used to be head of global cancer research at Amgen and knows what it takes to get a drug from idea to market. What it takes first is replication. Basically, Begley’s team would scour the scientific literature for interesting and promising results and then try to replicate them in such a way that their results could serve as a basis for developing drugs based on them. The idea was to identify new molecular targets for cancer and then figure out ways to make drugs to target them. This is what he reported:
Over the past decade, before pursuing a particular line of research, scientists (including C.G.B.) in the haematology and oncology department at the biotechnology firm Amgen in Thousand Oaks, California, tried to confirm published findings related to that work. Fifty-three papers were deemed ‘landmark’ studies (see ‘Reproducibility of research findings’). It was acknowledged from the outset that some of the data might not hold up, because papers were deliberately selected that described something completely new, such as fresh approaches to targeting cancers or alternative clinical uses for existing therapeutics. Nevertheless, scientific findings were confirmed in only 6 (11%) cases. Even knowing the limitations of preclinical research, this was a shocking result.
Here’s the part that I found to be profoundly misguided. Begley and Ellis basically admit that these are “landmark papers”; i.e., that they were highly novel. Presumably these papers would have been considered at the time of their publication to be “cutting edge” research, very likely published in high impact journals such as Nature, Cell, Science, Cancer Research, and the like. Unfortunately, although I looked, I didn’t see a list of the 53 “landmark papers—not even in an online supplement. Nor was the method of how these papers were analyzed described in much detail—not even in an online supplement. The irony inherent in a paper that rails against the irreproducibility of preclinical cancer research but does not itself provide the data upon which its authors based its conclusions in sufficient detail for the reader to determine for himself whether the conclusions flow from the data is left for SBM readers to assess for themselves. Similarly misguided, as was pointed out in the online comments, were the authors’ stated assumption that “the claims in a preclinical study can be taken at face value — that although there might be some errors in detail, the main message of the paper can be relied on and the data will, for the most part, stand the test of time” and their amazement that “this is not always the case.” If the authors’ assumptions were true, attempts to replicate scientific results would be less important than they are.
Be that as it may, what the authors are studying, however they studied it and whatever the 53 studies they examined were, is essentially frontier science. Given that, it strikes me as rather strange that they are so amazed that much of the science at the very frontiers turns out not to be correct when tested further. We in science understand the difference between settled textbook science and the sort of frontier science that makes it into journals like Science. Indeed, we often lament that the very highest tier journals, such as Nature, Science, and Cell, tend to be too enamored of publishing what seems to be “sexy science,” exciting or counterintuitive results that really grab the attention of scientists — in other words “cutting edge” or frontier science. Such journals seem to pride themselves on publishing primarily such work (which is one reason why they are so widely read and cited), while more solid, less “sexy” results seem to end up in second-tier journals.
This leads to a paradox: the science that is published in the highest profile, most prestigious journals is almost by definition the most tentative science. Given that, it is surprising how much of what is published in such journals actually does stand the test of time, but it should not be surprising that much of it does not. However, the very prestige of such journals gives such research seemingly more authority than research published in less prestigious journals. It is often said that one Nature, Science, or Cell paper is worth five or even ten papers in more pedestrian, middle-of-the-road journals when it comes to improving a scientist’s CV (and chance of a good job or promotion). Perhaps that is because publications in such journals are viewed as an indication that the work a scientist is doing is on the cutting edge. That perception, built up over time, is likely the major reason that it is very, very difficult to get a paper accepted and published in Science, Nature, or Cell. The vast majority of submissions are rejected, many without even being sent out for peer review, because an editorial decision is made that they are not “interesting” enough (something that happened to me once). In other words, the editors of such journals are actively looking for science that challenges the existing paradigm. However, scientists understand that papers published in the most cutting edge journals are tentative. They’re interested in the papers because such work is the most likely to advance the frontiers of science.
In fact, they’re interested in such papers for the very same reason that Begley and his group at Amgen were interested in them. Begley was the head of a major research division of a major pharmaceutical company. What does that mean? It means that it was his job to find new molecular targets for cancer and to develop drugs to target them. And it was his job to do all this and beat his competitors to the market with effective new drugs based on these discoveries. No wonder his group scoured high impact journals for cutting edge studies that appeared to have identified promising molecular targets! Then he had a veritable army of scientists, about 100 of them in the Amgen replication team according to this news report, who were ready to pounce on any published study that suggested a molecular target the company deemed promising.
Here’s another aspect of the study that needs to be addressed. As I read the study, a thought kept popping into my fragile eggshell mind. Remember Reynold Spector? He’s the guy whom both Mark Crislip and I jumped on for a particularly bad criticism of science-based medicine and its alleged lack of progress that Spector called Seven Deadly Medical Hypotheses. As both Mark and I pointed out, nearly all of these hypotheses were really not particularly deadly, and, indeed, most of them weren’t even hypotheses. What Dr. Spector shares in common with Dr. Begley is a background in pharma, and the similarities in the way they think are obvious to all. For instance, I castigated Spector for throwing around the term “pseudoscience” to describe studies that in his estimation do not reach the level of evidence necessary for FDA approval of a drug. That is a very specific set of requirements for a very specific problem: developing a drug from first scientific principles and then demonstrating that it is efficacious for the intended indication as well as safe. I got the impression from his articles that Dr. Spector views any study that doesn’t reach FDA-level standards for drug approval to be pseudoscience — or, at the very least, crap. I get the same impression from Begley. For example, here’s a passage from his article:
Of course, the validation attempts may have failed because of technical differences or difficulties, despite efforts to ensure that this was not the case. Additional models were also used in the validation, because to drive a drug-development programme it is essential that findings are sufficiently robust and applicable beyond the one narrow experimental model that may have been enough for publication.
Elsewhere in the article, Begley defines “non-reproduced” as a term he assigned “on the basis of findings not being sufficiently robust to drive a drug-development programme.” This attitude is, of course, understandable in someone running an oncology drug development program for a major pharmaceutical company. He is looking for results that he can turn into FDA-approved drugs that he can bring to market before his competitors do. So what he does is more than just try to reproduce the results as described in the publication. His team of 100 scientists tries to reproduce the results and extend them to multiple model systems relevant to drug design. That is, in essence, applied science. Think of it this way: How many basic science discoveries in physics and chemistry ever get turned into a product? How many of these findings are sufficiently robust and reproducible in multiple model systems to justify a team of engineers to spend millions of dollars developing them into products? Do physicists, materials scientists, chemists, and engineers obsess over how few findings in basic science in their fields can successfully be used to make a product?
I know, I know, apples and oranges. In medicine, those of us doing research do it in order to develop an understanding of a disease process sufficient to develop an efficacious new treatment. It’s a very explicit in what we do. However, sometimes we forget just how important it is to have a large, robust pipeline of preclinical results upon which to base translational research programs. Is the reason for the apparently declining percentage of basic science studies that are successfully translated into drugs more a function of the increasing ability of scientists, through large scale genomic and small molecule screens, to identify more and more potential molecular targets and potential drugs to use against them than of scientists doing something wrong? I also have to wonder if what Begley and Ellis are observing is the decline effect accelerated by 100 scientists prowling the scientific literature looking for experimental results they can turn into drugs. As I pointed out before, the decline effect doesn’t mean science doesn’t work, and, as I will point out here, Begley’s very methods would almost be expected to accelerate the decline effect.
The rest of the story
Don’t get me wrong. Although I find the premise of Begley and Ellis’ article to be misguided, there is important and disturbing information there. Unfortunately, the really important and disturbing information is not in Begley and Ellis’ paper. The omission of these critical pieces of information strikes me as a curious decision on the part of the authors and Nature editors.
For example, in the paper, we learn this:
In studies for which findings could be reproduced, authors had paid close attention to controls, reagents, investigator bias and describing the complete data set. For results that could not be reproduced, however, data were not routinely analysed by investigators blinded to the experimental versus control groups. Investigators frequently presented the results of one experiment, such as a single Western-blot analysis. They sometimes said they presented specific experiments that supported their underlying hypothesis, but that were not reflective of the entire data set. There are no guidelines that require all data sets to be reported in a paper; often, original data are removed during the peer review and publication process.
This is one reason that when I review papers I always ask if assays were performed in a blinded fashion, particularly when the results involve selecting parts of histological slides for any sort of quantification.
In an interview, however, we learn a lot more critical information:
When the Amgen replication team of about 100 scientists could not confirm reported results, they contacted the authors. Those who cooperated discussed what might account for the inability of Amgen to confirm the results. Some let Amgen borrow antibodies and other materials used in the original study or even repeat experiments under the original authors’ direction.
Some authors required the Amgen scientists sign a confidentiality agreement barring them from disclosing data at odds with the original findings. “The world will never know” which 47 studies — many of them highly cited — are apparently wrong, Begley said.
I find it very interesting that Begley didn’t mention this rather important tidbit of information in the Nature paper and why he and Ellis didn’t see fit to name names of studies for which non-disclosure agreements weren’t signed. One wonders if he (and the Nature editors) were concerned about litigation. In any case, the non-disclosure agreements obviously must predate the Nature paper. This tells me that Begley was in essence complicit in not revealing that his team couldn’t reproduce results, apparently not thinking such agreements to be too high a price at the time for access to reagents and help in the cause of advancing his company’s efforts. He’s willing to admit this in news interviews, apparently, but not in the Nature paper being used as a broadside against current preclinical drug development efforts.
Here’s another highly irritating passage from Begley and Ellis’ paper:
Some non-reproducible preclinical papers had spawned an entire field, with hundreds of secondary publications that expanded on elements of the original observation, but did not actually seek to confirm or falsify its fundamental basis. More troubling, some of the research has triggered a series of clinical studies — suggesting that many patients had subjected themselves to a trial of a regimen or agent that probably wouldn’t work.
Why do I say this is an “irritating” passage? Simple. It would have been very helpful if Begley and Ellis had actually named a couple of these “entire fields,” don’t you think? I suppose they probably couldn’t do that without indirectly revealing which papers whose results Begley’s team couldn’t reproduce. The lack of this information makes this jeremiad against how preclinical research is done today far less useful for actually fixing the problem than it might have been. Assessing the irony of a paper railing against current preclinical research methods that does not itself reveal its methods in sufficient detail to be evaluated or even its results except in fairly vague ways is again left as an exercise for SBM readers.
There are also many explanations for the variability in published research, as has been pointed out by other commentators. For instance, Nobel Laureate Phil Sharp homes in on one problem:
The most common response by the challenged scientists was: “you didn’t do it right.” Indeed, cancer biology is fiendishly complex, noted Phil Sharp, a cancer biologist and Nobel laureate at the Massachusetts Institute of Technology.
Even in the most rigorous studies, the results might be reproducible only in very specific conditions, Sharp explained: “A cancer cell might respond one way in one set of conditions and another way in different conditions. I think a lot of the variability can come from that.”
It’s true, too. I remember back in the late 1990s, several labs were having difficulty reproducing Judah Folkman’s landmark work on angiogenesis inhibitors, including the lab where I was working at the time. Dr. Folkman provided reagents, protocols, and advice to any who asked, and ultimately we were able to find out what the problem was, part of which was that the peptide we were using was easily denatured. We also learned that he had done the same thing for several labs, even to the point of dispatching one of his postdocs to help other investigators. Now imagine if Folkman had been like one of the scientists who had demanded non-disclosure agreements when Begley’s group had trouble reproducing his studies.
Still, the problems with Begley and Ellis’ article notwithstanding, they do provide useful information and identify what appears to be a serious problem. The problem is not so much that so few basic science discoveries end up as drugs, courtesy of Amgen or one of its big pharma competitors. Rather, it’s the sloppiness that is too common in the scientific literature, coupled with publication bias, investigator biases, and the proliferation of screening experiments done to identify genomic targets and small molecules with biological effects that has turned into the proverbial fire hose of data, often many terrabytes per screen. I also wonder if part of the problem is that all the “easy” molecular targets for therapy have already been identified, leaving the difficult and problematic ones. The result is alluded to but not adequately discussed in the news story I cited above:
As recently as the late 1990s, most potential cancer-drug targets were backed by 100 to 200 publications. Now each may have fewer than half a dozen.
The genomics, proteomics, and metabolomics revolutions that have occurred over the last 10-15 years are largely to blame for this. I would also argue (and perhaps Begley would even agree) that the competitiveness between pharmaceutical companies to be the “firstest with the mostest” for each new target hyped in the medical literature almost certainly contributes to this problem. After having been burned a few times, Begley could, for instance, have decided that his team wouldn’t seize on each of these new papers, that he’d wait until some more papers were published. He didn’t do that. For him, business as usual continued. An admission that he was part of the problem, either in the Nature paper or one of the interviews he gave to the press, would have been nice.
How to improve preclinical research
It’s true that I’ve been critical of Begley and Ellis’ article, but that’s mainly because of frustration. There are many things that need to be improved in terms of how science is applied. Readers might recall that I’ve written about problems with the peer review system, publication bias, the decline effect, and numerous other problems that interfere with the advancement of science and contribute to doubts about its reliability. Such problems are inevitable because science is done by humans, with all their biases, cognitive quirks, and conflicts of interests, but that doesn’t mean every effort shouldn’t be made to minimize them. Science remains the single best system for determining how nature works, and, no matter how much quacks and cranks might try to cast doubt on it because it doesn’t support their pseudoscience, no one has as yet developed a better system.
The question, therefore, is how to minimize the effects these problems have on how the scientific method is practiced, particularly given that the scientific method itself is designed to try to minimize the effects of human shortcomings on how evidence is gathered and analyzed. No matter how much cranks like Vox Day and Mike Adams’ minions try to portray Begley and Ellis’ article as an indictment of science itself, as slam dunk evidence that that science is not self-correcting and the scientific basis of cancer therapy is so much in doubt that quackery is a viable alternative or that religion is a more reliable way of seeking knowledge about the world than science, it is in fact nothing of the sort.
It does, however, tell us that we as scientists need to improve, and, indeed, we at SBM have discussed the shortcomings of medical science and ways to improve upon it on many occasions. In fact, I daresay that much of what we say jibes with the suggestions proposed by Begley and Ellis, including:
- More opportunities to present negative data.
- An agreement that negative data can be as informative as positive data.
- Requiring preclinical investigators to present all findings.
- Links added to articles to other studies that show different or alternate results.
- Transparent opportunities for trainees, technicians and colleagues to discuss and report troubling or unethical behaviours without fearing adverse consequences.
- “Greater dialogue should be encouraged between physicians, scientists, patient advocates and patients. Scientists benefit from learning about clinical reality. Physicians need better knowledge of the challenges and limitations of preclinical studies. Both groups benefit from improved understanding of patients’ concerns.”
- More credit for teaching and mentoring.
- Less emphasis on publication in top-tier journals.
- “Funding organizations must recognize and embrace the need for new cancer research tools and assist in their development, and in providing greater community access to those tools. Examples include support for establishing large cancer cell-line collections with easy investigator access (a simple, universal material-transfer agreement); capabilities for genetic characterization of newly derived tumour cell lines and xenografts; identification of patient selection biomarkers; and generation of more robust, predictive tumour models.”
Many of these are good ideas, although I’m not sure how practical it would be to require that investigators present “all” findings in journal articles and how such a requirement would be enforced. Defining “all” would be a challenge, and online supplements are already too much of a dumping ground these days. For example, does “all” mean investigators have to present the dozens of attempts it might have required to optimize assay conditions or include every experiment that was screwed up because someone used the wrong conditions or added the wrong reagent or left their tubes sit on the bench too long? Also, one notes how Begley assiduously avoids criticizing pharma for being so eager to leap on the latest cutting edge research before it has percolated through the literature, which, I conclude based on his very own complaint, is surely part of the problem.
So is the very nature of science. Scientists know that what is published the first time is considered tentative. It may or may not be correct. We also know that publication bias can mean that the first publication of a result might well be an anomaly that was published because it was interesting. That is science at the frontier. If other scientists can replicate the results or, even better, replicate the results and use them as a foundation to build upon and make new discoveries, only then does such a result become less frontier science. And if the results are replicated enough times and by enough people and used as a basis for further discoveries, to the point that they are considered settled results, that’s when they become applied science, such as a drug based on the principle originally discovered. It’s a process that is very messy and with lots of dead ends and blind alleys that go nowhere. While performing a valuable service that identifies problems with a lack of reproducibility in all too many preclinical cancer research studies, Begley and Ellis also unfortunately contribute to the mistaken impression that translational research is a linear process that goes from discovery to drug. It’s not, nor can it churn out major new treatments on demand.