There are good things and bad things about holding the Monday slot on this blog. One of the good things is that I tend to have more time to come up with material. However, one of the drawbacks of holding the Monday slot is that a lot of studies and news stories are published on Monday that I’d normally like to write about but can’t because by the time I see them my post is done. Then I have to wait until the following Monday, by which time something new has usually intervened, a shiny bright object to attract my attention from last Monday’s story, which, by the time I’m ready to write about it is old news. True, I can write about such stories on my not-so-super-secret other blog (and frequently do), but sometimes I like to take advantage of the greater exposure this blog provides.
So it was last Monday that Jonathan Kimmelman & Carole Federico published a commentary in Nature entitled “Consider drug efficacy before first-in-human trials.” I thought it an important topic considering the speculation over who will be pointed FDA Commissioner by President Trump. As I discussed three weeks ago and a New York Times article published yesterday discusses, Donald Trump’s three most likely picks for FDA Commissioner (that I knew of at the time) all favor loosening drug approval standards and could very well undo decades of safeguards. Two are cronies of technolibertarian Peter Thiel. Of these two, one (Jim O’Neill) believes that the FDA shouldn’t have to require evidence of efficacy, only safety, before approving drugs and devices for the market. (Apparently the free market will sort it all out.) The other (Balaji Srinivasan) believes that a “Yelp for drugs” would do a better job than the FDA in assuring drug safety. (Apparently he’s never actually spent much time reading online ratings systems.) The third candidate (Scott Gottlieb) is the closest thing I’ve ever seen to a bona fide, honest-to-goodness pharma shill, who called the early termination of a multiple sclerosis drug study “an overreaction,” even though three participants had died, defended the off-label marketing of Evista, and, when he was Deputy Commissioner of the FDA, harassed underlings when FDA scientists rejected Pfizer’s osteoporosis drug candidate Oporia, forecast to earn $1 billion a year.
Srinivani appears to be no longer in the running, thanks to his rookie mistake of not taking down his Twitter account before meeting with President Trump, allowing some of the dumber things he said about using the Uber model or the Yelp model for online reviews of drugs to be brought to light. However, there’s now a new candidate, Joseph Gulfo, executive director of the Lewis Center for Healthcare Innovation and Technology at Fairleigh Dickinson University, the author of Innovation Breakdown: How the FDA and Wall Street Cripple Medical Advances and the former President and CEO of the medical device firm MELA Sciences. More on him later, but you can see from the title of his book that his philosophy is from the same mold as O’Neill’s and Srivinasan’s, but, unlike them, appears a bit more grounded in reality, although he appears to share O’Neill’s and Srivinasan’s unshakeable belief that the FDA is far too strict.
Basically, no matter who is finally chosen by the Trump administration to run the FDA, it is clear that the mandate will be to decrease regulation. Indeed, just last week President Trump told a gathering of pharmaceutical executives, “We’re going to get the approval process much faster.” The only question, thus, will be by how much. If it’s O’Neill or Srivinasan, I’d expect a push towards a radical overhaul of the FDA. If it’s Gulfo or Gottlieb, I’d expect a less radical, but still highly deregulatory, approach. Add to that President Trump’s recent executive order mandating that federal agencies eliminate two regulations for every new regulation they issue, as well as a legislative agenda designed to “decrease regulatory burden,” and under this administration we will almost certainly see a loosening of drug approval standards, all in the name of “fostering innovation.” It’s a process that actually began before Donald Trump with the passage in the waning days of the Obama administration of the misbegotten 21st Century Cures Act, which promised to get the cures flowing from industry by authorizing the FDA Commissioner to develop a framework decreasing the level of evidence needed for drug and device approval. That’s why this Nature commentary is so important now. It argues for a higher standard before even allowing a drug to be tested for the first time in humans, thus going very much against the prevailing sentiment in Washington right now.
A disaster in a “first in human” drug trial
Kimmelman and Federico begin by describing an incident to illustrate the problem. It’s not an example from the US, but it is a good starting point to discuss the dangers of “fast-tracking” new drugs:
On 17 January 2016, a healthy man was declared brain-dead after receiving an experimental drug in a first-in-human trial in France. Four of five other subjects receiving the same dose have serious, ongoing neurological complications. Investigations into the trial described many troubling safety practices, such as steep increases in dose levels delivered to sequential subjects without sufficient delays to check for safety.
The year since has brought intense scrutiny about how the debacle could have been anticipated and prevented. However, another issue is still largely overlooked: the duty to evaluate whether an experimental treatment is promising enough to warrant testing on people.
When thinking about drug approval, most people focus on phase III clinical trials, which are large randomized, double-blinded (when possible) trials designed to test the efficacy of an experimental drug versus placebo and/or existing drugs. However, phase III trials are only the culmination of the process of pre-approval drug development. Long before that there is preclinical research in chemistry, cell culture, and laboratory animals. Only when there is deemed to be sufficient evidence of promise does an experimental drug advance to phase I trials, also known as “first in human” trials. Incidents like the one above are one reason why I view “right-to-try” laws with such alarm, as such laws require nothing more than that a drug has passed phase I tests to be eligible for a patient to use them outside of a clinical trial.
Be that as it may, the French medicines safety agency (ANSM) ordered an investigation and re-examination of the data provided to the French ethics panel by the drug company Bial, based in Trofa, Portugal—in essence, the French version of what we in the US call the Institutional Review Board (IRB). This re-examination resulted in a report, with the following findings:
The report notes that the 63-page Investigator Brochure describing the trial included fewer than two pages of evidence that the drug had the desired pharmacological activity. It identified only two studies presented as evidence for efficacy, both problematic. In one, Bial had data for a different marketed drug showing it was more effective than Bial’s drug at relieving pain in animals, but did not include that information in a summary figure. Both preclinical studies showed only “moderate” positive effects. Moreover, Bial’s drug had been tested at a range of doses in mice that made it impossible to estimate the most likely effective dose in humans.
Press coverage following the tragedy quoted independent experts concluding that there was little evidence to support a trial, and that at least five other drugs designed to act in a similar way had been tested in people without success.(Bial maintains that toxicities were not predictable and that it has followed all human-testing norms. We approached the company for more information about the event for the purposes of this Comment but received no response.)
This incident is a good way to start a discussion about just how much preclinical evidence for a compound is enough to justify a first-in-human (FIH) clinical trial.
Efficacy versus safety in clinical trials
Kimmelman and Federico are bioethicists who have studied the ethics of first-in-human (FIH) and early-phase drug research for more than a decade. They wrote this editorial because they are concerned that there is a lack of emphasis on the efficacy of drug candidates observed in pre-clinical research during decisions about whether to advance a drug candidate to FIH clinical trials. They call for “infrastructure, resources and better methods to rigorously evaluate the clinical promise of new interventions before testing them on humans for the first time,” noting that more than half of all drugs that reach later stage trials (Stage II and III) fail because they do not demonstrate efficacy, going on to write:
Today, the evaluation of preclinical evidence is especially important. Favoured picks for the next commissioner of the US Food and Drug Administration (FDA) are likely to lower the current requirements that a drug must demonstrate efficacy in humans before entering the market. If so, low standards for launching clinical trials in the United States could result in ineffective drugs being approved, while also decreasing incentives.
Regulators in Europe and North America evaluate safety before human trials can proceed, but they do not currently demand evidence for potential efficacy. At a workshop of the US National Academy of Sciences in September, Robert Temple, a veteran at the FDA’s Center for Drug Evaluation and Research, said that the agency largely left it to drug sponsors to evaluate their rationale that an experimental drug was likely to work. “I can’t think of any cases where [FDA has] said you can’t do this [phase I] study because we’re just too sceptical.” The European Medicines Agency (EMA) — Europe’s drug regulator — is similarly silent about the evaluation of clinical promise, even in proposed revisions to guidelines prompted by the Bial affair.
I’d say that Kimmelman and Federico are, if anything, understating the case. Lowering drug approval standards with respect to efficacy would not only be likely to result in more ineffective drugs being approved, but, if coupled with a generalized anti-regulatory push and an attitude to approve more drugs, likely to lead also to a decrease in safety standards as well, as the two are coupled. Think of it this way. We accept high levels of toxicity in chemotherapy drugs because they treat a deadly disease, but would never tolerate toxic side effects as serious to treat, for instance, a headache. Perfect safety is never possible in any drug, but must always be weighed against efficacy and the seriousness of the condition for which the drug is to be marketed. If efficacy is overestimated or de-emphasized, then considerations of the benefit-risk ratio will be skewed.
Kimmelman and Federico go on to argue that commercial interests “cannot be trusted to ensure that human trials are launched only when the case for clinical potential is robust.” While I accept that this is true, I also can’t help but note that commercial interests also do not want to spend boatloads of money carrying out clinical trials on experimental drugs unlikely to be shown to be efficacious and safe. When a new drug fails phase III clinical trials, it’s a huge loss to a company, both in terms of development costs and in terms of the loss of projected revenue from sales of the drug, which explains why Scott Gottlieb was so upset when the FDA rejected Oporia. No, I’m not being sucked in to Jim O’Neill’s fantasy that the free market will guarantee drugs that are safe and effective. I’m just noting that the situation is…complicated. There are competing incentives, both regulatory and financial. Nor do I disagree that commercial interests should be trusted to ensure that the preclinical evidence is robust and bulletproof before launching a clinical trial. Rather, I’m emphasizing again that lowering the standards would shift the incentives to make it more worthwhile to pharmaceutical companies to see what they can get away with.
What is to be done?
It’s certainly true that very few FIH trials have resulted in disastrous outcomes like the one in France. Indeed, it’s noted that in Europe only 2 out of 3,100 FIH trials overseen by the European Medicines Agency since 2005 have had disasters like the one in France and that serious harm and that the record in the US is similarly good. However, Kimmelman and Federico are correct to point out:
But, even if individual participants are not harmed, trials of ineffective therapies place burdens on society. Drug development is costly, in terms of money and people. Patients, healthy volunteers and experts involved in testing a dud treatment are not available for more promising ones. Expenses wasted on ineffective therapies and uninformative trials result in higher drug prices. Investigators, host institutions and sponsors have a responsibility to consider all this before embarking on new research programmes.
Moreover, researchers have ethical obligations to “assure that the risks to subjects are reasonable in relation to the anticipated benefits”, according to FDA guidance. Such regulators explicitly delegate these appraisals to ethics review committees.
Basically, the problem is that drug developers can, in essence, cherry pick favorable preclinical evidence in order to make their case that a clinical trial should be allowed. The other problem is that regulatory agencies will accept weak evidence of efficacy in animal models. Basically, the three questions that have to be answered in assessing whether preclinical evidence justifies a FIH trial:
- What is the likelihood that the drug will prove clinically useful? (In other words, have other similar drugs worked? What drugs already exist for the disease and how well do they work?)
- Assume the drug works in humans. What is the likelihood of observing the preclinical results? (How robust and reproducible are the preclinical data? How large are the effects observed? Do the animal models used reflect human disease well enough?)
- Assume the drug does not work in humans. What is the likelihood of observing the preclinical results? (How do we know that the preclinical observations aren’t due to random variation and bias?)
To address these questions and existing problems, the authors propose the following:
- Require drugs sponsors to include negative results from animal studies in documents submitted to investigators and ethics committees. This is similar to the approach of All Trials in that greater transparency is recommended. Just as drug companies should have to report the results of negative preclinical trials, it only makes sense to apply the same standard to preclinical evidence submitted to support a FIH clinical trial. It’s even suggested that an additional way to discourage data cherry-picking would be to require that drug sponsors sign a statement testifying that the clinical and preclinical evidence presented on clinical promise is complete and unbiased.
- Encourage reviewers to consider a broad base of evidence in assessing the probability that a drug will prove clinically useful: for example, how have other drugs in the same class performed in trials? One important part of any application, be it for a grant or for approval of a clinical trial is the “Background and Significance” section. That is the part of the application where the applicant summarizes the existing evidence base supporting the application, discusses shortcomings in current knowledge and potential controversies, and makes the case for the significance of what is proposed before presenting the applicant’s own preliminary data supporting the proposal. It’s very easy to cherry pick studies that are positive and leave out studies that are negative. If the reviewer examining the application is not very familiar with the field, he could easily be unaware of the information left out. Similarly, it makes sense not to consider a drug in isolation. If another member of the class of drugs as an experimental drug being considered for a clinical trial failed its clinical trials, it’s important to make a case for why this experimental drug is different, why this drug is expected to succeed where the others in its class failed.
- Allow trials to proceed only after careful vetting of the preclinical evidence by independent experts. See above. Basically, in order to detect cherry picking and evaluate the evidence with a hard, cold eye, you need reviewers who are truly experts in the field and who have no vested interest in whether the FIH clinical trial proceeds or not.
Of course, all of this would require more investment. For example a new centralized FIH system is proposed:
Instead, we suggest the creation of a centralized FIH advisory system that combines ethical and scientific review. Several precedents exist. The Recombinant DNA Advisory Committee (which reviews new gene-transfer protocols) has assessed evidence of both risk and efficacy since it began reviewing human gene-transfer studies in 1989. Further examples of centralized, expert review of clinical trials in the United States include the SMART IRB Reliance Platform at the National Center for Advancing Translational Sciences; the National Cancer Institute’s Central IRB; and the Office for Human Research Protections’ ‘407 review process’ for certain paediatric trials.
The FIH advisory mechanism we envision would consist of subcommittees that specialize in clinical areas (for example, neurodegenerative disease, cancer and cardiovascular disease). Advisory-committee assessments would, like most of the above examples, be included in materials presented to physician–investigators and local ethics committees.
The authors anticipate some criticisms. For example, setting up a system like this would cost money. Their response is that such a review system might actually decrease cost and burden to the pharmaceutical companies, through the prevention of clinical trials unlikely to show a positive result by insuring a more sound basis for late-stage clinical trials, thus offsetting the cost at least partially.
Another anticipated criticism is exactly the sort of argument several of Trump’s candidates for FDA Commissioner make, namely that such a system could prevent truly promising candidates from being tested. Of course, I can’t help but point out that I’d be hard pressed to consider a candidate to be “truly promising” if the preclinical evidence supporting it is weak and or the effect size of the compound is small in animal models. The authors note:
However, we are not arguing that the preclinical evidence must be strong, rather that it be examined critically to inform ethical judgement. For diseases in which robust preclinical evidence is impossible — for instance, where animal models are clearly inadequate as in many neurodegenerative disorders — a limited suggestion of clinical promise might be enough to justify trials for a relatively benign drug candidate aimed at a great unmet medical need.
In other words, as I mentioned above, we need to look critically at all the evidence for new drugs and put it in the context of the seriousness of the disease, whether effective therapies already exist, and existing science and evidence.
Joseph Gulfo: Use biomarkers, not overall survival, to show drug efficacy
As I’ve discussed before, much of philosophy behind the Trump administration’s approach to the FDA seems to be based on a technolibertarian fantasy that, if only the “heavy hand” of government regulation were removed from entrepreneurs and industry, the cures would flow and the free market would sort out which drugs do and don’t work. That is not surprising, given that Peter Thiel, who’s never met a regulation he didn’t hate, appears to be having an outsized influence on whom Trump will nominate for FDA Commissioner, so much so that, ironically, I’m actually hoping for the pharma shill to be chosen rather than the Thiel cronies who want, in essence, to dismantle the FDA. The FDA could survive Scott Gottlieb (indeed, it already has). I don’t know if it could survive Jim O’Neill or Balaji Srinivasan. But what about Joseph Gulfo, whom I haven’t discussed before?Joseph Gulfo doesn’t (quite) fit the mold of the Thiel cronies and was dismissive of Scott Gottlieb, even saying, “If you want safe snake oil, Jim O’Neill’s your man.” He also believes that the FDA should require evidence of efficacy before approving a drug. The problem is, though, that he defines “efficacy” differently from how the FDA does. Basically, he redefines efficacy by using a lower standard of evidence. In a WSJ editorial written in November, “A Trumpian Cure for the FDA’s Chronic Lethargy“, he explains:
The first change involves returning the FDA to its original role under the law. That is to prevent snake oil from getting on the market by ensuring that the only drugs approved for sale have demonstrated biological activity in fighting a disease and can be labeled for safe use. The FDA would no longer require approvals based on long-term health outcomes—a practice that dissuades development and increases the complexity, cost and duration of clinical trials.
This argument sounds at least semi-reasonable on the surface, but has huge holes in it, as you will see. You will also see that there is at least one area where Gulfo does fit in with the Thiel crony contingent:
This revision would pave the way for products proven safe and effective to be made available to determine which are most beneficial for individual patients. Patients and their physicians, not the FDA, need to make private health decisions; the Internet of Things would ensure that they have the best information to make these choices.
What is it with this touching faith in technology to fix all the problems in drug development that so many share?
But let’s get back to what, specifically, Gulfo means by returning the FDA to its original role:
The way to return the FDA to its proper role is to pass legislation that defines effectiveness categories for the agency, such as:
How the drug affects biomarkers, such as the lowering or raising of blood-test parameters associated with disease, e.g., glucose levels, blood coagulation parameters, and cancer proteins. How it affects clinical signs and symptoms, e.g., blood pressure, tumor-shrinkage, pain, fever and infection. How it affects disease modification, e.g., prevents joint damage, relapses of multiple sclerosis or migraines. And how it affects long-term outcomes, e.g., survival, stroke and heart attack.
These categories would form the basis of approval, and the labels would be color-coded so that physicians and patients know precisely the nature of the clinical evidence used to prove that the drugs are effective and what to expect when using them.
Yes, he wants to color code FDA-approved drugs based on the level of evidence—because there’s no way that can go wrong, right? In a more recent interview, Gulfo makes similar arguments.
None of this is a new debate, actually. In oncology, for instance, we’ve been debating over whether drugs should have to show a benefit in overall survival (OS) in order to be approved for at least two decades. Cancer therapies are generally evaluated using a number of endpoints. The most commonly used include OS and progression-free survival (PFS). OS is what it sounds like: How long do patients survive their cancer after diagnosis? Period. It’s hard for an endpoint to be more objective than that: Either the patient is alive or he is dead. This number is usually expressed in terms of median survival, which is the period of time after which half of the patients under study are still alive and half have died. This includes all causes, not just cancer. If a patient with cancer under study dies of a heart attack that is not related to his cancer or his cancer treatment, that counts. Traditionally, OS has been the “gold standard” endpoint in measuring the efficacy of a cancer therapy, because the primary goal has been to prolong survival, the ideal case being prolonging survival to the point where it is indistinguishable from life expectancy if the patient never had cancer in the first place. PFS is survival without progression; i.e., how long the patient with cancer survives before his or her tumor starts measurably growing again or metastasizes. While PFS is often measured as well as OS, it’s generally considered less useful because it is entirely possible for a treatment to prolong PFS without prolonging OS. This sort of result can happen when the treatment is effective at shrinking a tumor or slowing its growth but its toxicities can result in death. Thus, PFS can improve with no improvement in OS.
I mention this distinction because several years ago, the FDA approval for Avastin to treat metastatic breast cancer was revoked because it had been based on studies showing improvements in PFS but later studies failed to show an improvement in OS in a later study. I discussed the case in detail in 2010, if anyone’s interested. The point is that short term surrogate markers often don’t correlate with long term health outcomes. Sometimes they do. For instance, pathologic complete response (pCR) of breast cancer to chemotherapy (in which a tumor melts away completely, so that not even a single cell is detected when the area where the tumor was is resected) does correlate with long term survival, so much so that the FDA proposed allowing the use of pCR as a surrogate endpoint in clinical trials for the accelerated approval of drugs targeting breast cancer, although more recent work has suggested caution and pointed out that only in certain subtypes of breast cancer does the surrogate endpoint of pCR appear to predict OS.
Yes, again, it’s complicated. John LaMattina over at Forbes.com points out the major holes in Gulfo’s plan:
Let’s take Alzheimer’s Disease (AD). Just last week, Lilly reported very disappointing results for its AD drug, solanezumab, which failed in a long-term clinical trial. The drug was essentially safe, but it just didn’t work. Under Dr. Gulfo’s plan, solanezumab would have been approved years ago based on its effects on biomarkers and millions of desperate people would have taken it in the hopes that it would have halted or delayed the consequences of this ravaging disease. Furthermore, the healthcare system would have paid at least hundreds of millions, if not billions, of dollars for what essentially is a placebo. Under the Gulfo Plan, patients would have felt betrayed and cheated had they been taking an ineffective drug for five years or more, regardless of whatever color was on the label.
Didn’t Gulfo criticize Gottlieb as someone who’d approve a bunch of “safe snake oil”? Well, that’s basically what Gulfo’s plan would do—and likely to an even greater extent! That’s not all:
Furthermore, the importance of demonstrating real-world effectiveness is not just limited to AD drugs. There are drugs that were believed to be the ultimate answer for treating heart disease–the CETP inhibitors, developed by Pfizer, Roche and Lilly–that simultaneously raised “good cholesterol” (HDL) and lowered “bad cholesterol” (LDL). By totally remodeling one’s lipid profile, these drugs were hoped to be the answer in preventing heart attacks and strokes. Again, the Gulfo proposal would have had these drugs on the market for years before long-term cardiovascular outcome studies showed that not only were these drugs ineffective in reducing cardiovascular disease, but that they could actually be harmful despite the fact that these drugs more than doubled HDL levels and, in combination with statins, lowered LDL levels by 50-60%. Such activity was unprecedented and had heart patients and cardiologists very excited. Had these drugs been approved based solely on lipid modulation, their sales also would have been in the billion-dollar range. Yet, they ultimately offered no benefit to patients.
Yes, as LaMattina notes, not all drugs that shrink tumors prolong survival (Avastin for breast cancer) and not all drugs that lower blood glucose prevent the end complications of diabetes. Just because a drug changes your biomarkers doesn’t necessarily mean it’s improving your health. Yes, shortcuts are tempting. What researcher wouldn’t want to be able to do a two year trial based on surrogate endpoints and biomarkers instead of a ten year trial looking at long term health outcomes if the surrogate endpoints and biomarkers predict the long term health benefits? The problem is that, far more often than people like Gulfo seem to think, they don’t.
The FDA is almost certainly going to go in the wrong direction
Kimmelman and Federico lay out a persuasive case that there is insufficient attention paid to the robustness of preclinical evidence by the FDA and European regulatory agencies and that something should be done to correct this problem. Unfortunately, here in the US under President Trump, we are poised to go in exactly the wrong direction, thanks to a faith-based belief in the almighty power of the free market and an unrelenting hostility to even justified government regulation. No matter whom Trump appoints as FDA Commissioner, we can expect the level of clinical evidence necessary to approve drugs to become less rigorous, at least as much as current law allows, and we might even see legislation passed to make the approval of drugs even easier, in essence building on what the 21st Century Cures Act did. A new FDA Commissioner could undo regulation in several ways without Congressional approval though; e.g., by interpreting existing regulations as loosely as possible, so that requirements for certain clinical trials—especially large-scale ones that can take years and involve thousands of patients—can be weakened or eliminated. Ironically, the most effective way to speed up drug approval would be to add staff, not cut it; yet there’s no sign that the Trump administration has plans to do that.
I concede that a reasonable argument can be made for a drug that impacts surrogate endpoints but not long term health outcomes if it significantly improves patient quality of life. In oncology, for instance, it is sometimes argued that if a drug improves PFS, but not OS, it should be approved if it improves patient quality of life. (Unfortunately, in the case of Avastin, the evidence for improved quality of life was very weak.) However, that is not the primary argument being made by Gulfo, and it’s not the argument being made by any of Trump’s other candidates known to be under consideration for FDA Commissioner. All of them, particularly the Silicon Valley contingent under the thrall of Peter Thiel seem to think that an overly strict FDA is the key impediment to innovation, when in fact it is primarily the limits of our knowledge. Computer developers control the hardware and software. Medical researchers do not have that advantage, but technolibertarians keep making simplistic analogies to computer development when discussing drug development.
If the preclinical evidence requirements for efficacy are already too low, then lowering the clinical requirements for efficacy (e.g., by approving drugs based only on biomarker changes) will be an even bigger disaster than I thought before, because more ineffective drugs would be approved. Given that no drug is completely safe, the approval of such drugs could then result in harm and expense with no benefit. No one, least of all I, is saying that the FDA is perfect or that there aren’t ways of streamlining the FDA approval process, even though it is already fast compared to European agencies. Moreover, the FDA has already started to adapt to new science by incorporating biomarkers and new trial designs for precision medicine into its approval process. Dramatically decreasing the bar that new drugs have to clear to be approved is not the answer to developing treatments and cures for deadly diseases. It’s as though the Trump administration assumes that more and faster drug approvals will translate into better health when they would be unlikely to benefit patients and more likely to result in more expense and harm.