[Editor’s note: Here is a guest post from a new contributor, Dr. Christopher Labos! Dr. Labos is a cardiologist at the University of McGill in the Great White North, and is a peer of Dr. Joe Schwarcz in McGill’s Office for Science and Society. We’re publishing it today as an extra bonus post the day before to complement Dr. Gorski’s usual Monday post because, unfortunately, the claim that cell phones cause cancer is back in the news again, thanks to the study Chris deconstructs in this post. Dr. Gorski’s post tomorrow will discuss a particularly egregious bit of reporting on cell phones and cancer. Enjoy!]

If anyone has ever tried to convince you that cell phones cause cancer, then they have probably cited the results of the National Toxicology Program (NTP) study. There have, of course, been other studies over the years, most of which aren’t nearly as conclusive as some would have you believe. Despite what IARC says, the evidence for a cell phone cancer link is very flimsy. For example, the Interphone study is often quoted as supporting a link between cell phones and cancer even though it was an overall negative study.

The NTP study though was an attractive argument for proponents of the cell phone – cancer link because it has the veneer of government respectability, was billed as a massive multimillion-dollar trial, and generated a lot of press attention. When they released the partial findings of their study in 2016 and showed an association between cell phones and cancer (specifically gliomas and cardiac schwannomas), a huge wave of frightening news headlines followed. The researchers told Science that they decided to release the data before completing their analysis and write-up because of “high public interest” in the results.

That these were partial or preliminary findings seems not to have bothered or deterred many people from claiming that cell phones cause cancer. Fortunately, the full data is now available, although it will still go through an external peer review and we may yet see further modifications.

The NTP study design

Before looking at the new full results, it is worth looking at the study itself. David Gorski and others have pointed out some of the issues with the study. To briefly summarize, the NTP study was an animal study and it had no human subjects. The rats and mice being studied had their entire bodies exposed to the radio frequency radiation (RFR) from cell phones for nine hours a day, every day, for two years, starting not at birth but in utero. In the press release accompanying the release of the full study results, one of the senior scientists of the NTP, John Bucher, acknowledged:

The levels and duration of exposure to RFR were much greater than what people experience with even the highest level of cell phone use, and exposed the rodents’ whole bodies. So, these findings should not be directly extrapolated to human cell phone usage.

Now to be fair, the NTP’s goal was to test toxicity and not to establish if human use was dangerous, which is essentially the difference between hazard and risk. But even if you accept the NTP study as true (and bear in mind we aren’t done with its limitations yet), its results do not have a direct bearing on how you as a human being use your cell phone. To their credit, the NTP researchers seem to be acknowledging this in the relatively nuanced way they have discussed their findings in the media.

But there were other problems, notwithstanding the lack of applicability to humans. The rate of cancer was very low in the control group and inconsistent with historical controls. There was no consistent dose-response meaning that higher levels of RFR did not clearly increase the risk of tumor formation. Finally, the rats exposed to cell phones lived longer than the rats in the control group.

This final point got very little press in the popular media. Most likely “cell phones cause cancer” was felt to be a more eye-catching headline than “cell phones make you live longer.” Nevertheless, many people were more than willing to ignore any findings that didn’t fit with their pre-determined narrative. While this may be understandable in terms of human nature and marketing, it is certainly not good science.

Malignant gliomas

Regardless of these limitations, the NTP study has spent the past 2 years being used as an argument in support of the claim that cell phones cause cancer. The full results released earlier this year were surprising, to me at least, because they were different from the initial preliminary results in a few notable ways. Most notably, the full results did not show a clear association between brain tumors and cell phone use. The discrepancy between these results and the 2016 preliminary findings had to do with a difference in their statistical analysis.

The authors explained that their preliminary analysis did not adjust for litter effects. Litter effects occur because pups born in the same litter are more similar to each other than they are to other rats. In effect, two pups in the same litter getting a disease is the not the same as two random unrelated pups getting the disease. While this may seem like a very minor and subtle statistical point, it was enough to change the authors’ interpretation of their data.

There was only one very small sliver of statistical significance (with a very notable absence of clinical significance). There was no positive association between cell phones and brain neoplasms for female rats, male mice, or female mice. The link occurred only in male rats and only for one type of signal modulation. The NTP study actually tested two different types of modulation schemes used in cell phone technology: GSM and CDMA. For GSM there was no statistically significant result for malignant gliomas or any other type of brain neoplasm. For CDMA, the rats exposed to 6W/kg had three (3) malignant gliomas, compared to none (0) for those exposed to 3W/kg, 1.5W/kg, or no exposure.

If you said that this finding could have been solely due to chance then you are to be congratulated in your astute observation. In reality when you compared the 6W/kg to the control group, there was no statistical difference. The statistically significant finding only occurred when you compared all four groups in a linear trend test.

Now, the super observant of you will have noticed that the 3W/kg and 1.5W/kg group had no brain tumors in them and you might be somewhat puzzled as to why adding zero brain tumor cases to a statistical analysis would make a result go from non-significant to significant. The answer is somewhat mathematically convoluted but, to put it simply, different statistical tests ask and answer slightly different questions, some of which are more meaningful than others. If your question is “Does the highest cell phone exposure group (6W/kg) have more brain tumors than the control group?” then the answer to your question is no.

You would be hard pressed to call the brain cancer results definitive. The authors called the results “equivocal,” which in my mind is somewhat generous. In any case, they did not suggest there was an association between cell phones and brain in the press release accompanying the release of their data, essentially walking back the claim from the 2016 preliminary findings.

Cardiac schwannomas

In the end, the only finding highlighted in the summary statement was the association with cardiac schwannomas. Schwannomas are tumors arising from Schwann cells that produce the myelin sheath around peripheral nerves. There has been concern that cell phones, because we hold them up to our ears when speaking into them, could increase the risk of a similar type of tumor called an acoustic neuroma. Some studies have suggested a link between acoustic neuromas and cell phones. Then again other studies do not suggest such a link and it is worth noting the positive studies seem to come mostly from one research group.

Thus, when the NTP found an association between cell phone exposure and cardiac schwannomas, it could be seen as supportive of the cell phone -acoustic neuroma link, since both types of tumors are histologically similar. Drilling down into the data though poses some problems for this association because, like the glioma analysis, the association only occurred in male rats. It did not occur in female rats, male mice, or in female mice.

The other odd point about the cardiac schwannoma analysis is that the rats had their entire bodies irradiated. Since there is no immediately obvious reason why these tumors should be localized to the heart, the schwanommas could have and did occur in any nerve through the rats’ bodies. The researchers found schwannomas in other organs including the pituitary gland, trigeminal nerve, salivary gland, and eye. However, when you look at all schwannomas, not just the ones in the heart, there was no significant difference between the exposed and control rats. Therefore, for this analysis to be positive (and therefore concerning to anyone) you have to ignore the rest of the body and focus only on the tumors occurring in the heart.

The play of chance and miracle dice

There is, however, another more mathematical reason to be skeptical of the NTP results. The NTP study is a vast undertaking and the current reports are over 650 pages long for both rats and mice. But that vastness is its Achilles heel. They present dozens upon dozens of analyses, and if you do enough analyses you will eventually get a positive result simply due to the play of chance.

To understand how pernicious the play of chance can be in medical research, it is worth reviewing the study of Carl Counsel and his miracle dice. Counsel gave his statistics students red, white and green dice as part of a class exercise. He had them roll the dice repeatedly to simulate the results of clinical trials (rolling a 6 meant the patient died from a stroke while rolling a 1 through 5 meant the patient survived). He told his students some dice were loaded and would produce more or less sixes. After generating and analyzing their data, his students found that red dice increased the odds of “death” and that white and green dice were protective. Therefore, in their minds, they had discovered which dice were loaded.

They were sadly quite mistaken since Counsel had purposely deceived them and given them perfectly normal, evenly-balanced dice. The positive association occurred because Counsel’s students went looking for a result (any result) and subsequently found one. They were so convinced by their data that when Counsel explained to them that the dice were fair, some of them didn’t believe him:

Some of the participants were convinced that their own dice was really loaded. Trialist A described his reaction to his first trial… he rolled one six, followed by another and then a third. He said that his room felt eerily quiet as he rolled a fourth six: he had never rolled four sixes in a row in his life. By the time he had rolled the fifth, he was certain that the dice was loaded, and the sixth six only confirmed his belief that DICE therapy clearly had an effect.

Counsel explained in his paper that he “undertook this slightly tongue in cheek study to illustrate just how extraordinary the effects of chance can be” and that even high methodological standards are no defense against it. He explains:

Chance does not get the credit it deserves. Most doctors admit that chance influences whether they win the Christmas raffle but underestimate the effect of chance on the results of any clinical trials they read about.

So could the results of the NTP study have been purely due to chance? Given the multiplicity of analyses, I think we need to concede that this is possible and in fact is pretty likely. The entirety of the NTP study essentially had four different groups (male rats, female rats, male mice, female mice) that were analyzed for many different types of cancer (heart, brain, pituitary, adrenal, liver, prostate, kidney, pancreas, mammary gland, and thymus cancer among others). In reality they looked at many different subtypes of cancers and of course some cancers would only occur in one sex (like prostate cancer or ovarian cancer). But for the sake of mathematical simplicity let’s assume they analyzed seven types of cancer, which could have occurred in all four groups of rodents. Seven types of cancer across four groups of subjects yields 28 different analyses, and it is highly likely that you would get at least one false positive result due purely to chance.

If you assume that the chance of a false positive result is 5% (which is a standard assumption) then if you do one analysis, the chance of a false positive is:

1- 0.95 = 0.05 or 5%

Do two analyses and the chance of at least one false positive is:

1- 0.952 = 0.0975 or 9.75%

Do 5 analyses and the chance of at least one false positive is:

1- 0.955 = 0.23 or 23%

Do 28 analyses and the chance of at least one false positive is:

1- 0.9528 = 0.76 or 76%

If a researcher does enough statistical tests, then some false positive results are inevitable simply due to the play of chance. The ISIS-2 study demonstrated this fact perfectly. The ISIS-2 is historically important in the field of cardiology because it demonstrated that giving aspirin to patients after a heart attack improved outcomes. However, even though the study was overall positive one subgroup of patients showed no benefit. That subgroup was patients born under the zodiac signs of Gemini and Libra. Fortunately, 1988 was apparently more reasonable time and no one suggested that astrology become a complementary or integrated field of cardiology. In fact, the authors of the ISIS-2 study purposely highlighted this rather ludicrous and totally spurious statistical finding to demonstrate that “all these subgroup analyses should be taken less as evidence about who benefits than as evidence that such analyses are potentially misleading.”

Adjusting for random chance

We are not helpless when it comes to the problem of randomness though, and there many things we can do to avoid falling into the trap. The first option is to avoid multiple testing in the first place and simply test one hypothesis in a well-powered study that will address the question you want to answer. If you must test everything under the sun, then replicating your findings is crucial, though repeating a multi-year study obviously takes considerable time, money and effort.

Since waiting for someone to replicate your findings is not useful in the short-term, one potential solution to the multiple-hypothesis testing problem is to use the Bonferroni correction. The Bonferroni correction is a rather simple statistical adjustment and requires very little extra math. If you do one analysis, you can stick with the standard p<0.05 threshold for statistical significance (I personally do not agree with using p-values as a statistical threshold for multiple reasons but I will bite my tongue for now).

If you choose to do multiple tests, then you simply divided 0.05 by the number of tests you intend to perform to establish your new threshold.

  • So if you perform two tests then your new threshold is 0.05/2 = 0.025.
  • Perform 10 tests and the new threshold should be 0.05/10 = 0.005
  • Perform 20 tests and the new threshold should be 0.05/20 = 0.0025

In the field of genetics, where researchers can test over a million genes at a time, the Bonferroni correction is regularly used to avoid false positives and statistical associations may have to reach p-values of p<5×10-8 in order to be deemed significant. We do not need to go that far, but the point is that there is nothing magical or even intuitive about the p=0.05 threshold, and in a study that plans to perform dozens of different analyses, a lower threshold is certainly justified. Not everyone agrees with the Bonferroni corrections and some say that it is too conservative, meaning that the p-value threshold is too low. More permissive alternatives to the Bonferroni correction do exist, although it has remained popular due to its mathematical simplicity and its intuitive straightforwardness. Whatever strategy you subscribe to, the point is that when testing multiple hypotheses within the same dataset, it is important to down the p<0.05 threshold, that researchers seem to be obsessed with, downwards. If the NTP had implemented a Bonferroni correction none of the “statistically significant” results would have met the cut, since most barely squeaked by the p<0.05 threshold.

The NTP study in context

So is it logical to assume that cell phones induce cancer in male rats, but not female rats, or in mice? Is it logical to assume that cell phones increase the risk of cardiac schwannomas but not of schwannomas in the rest of the body? Is it logical to assume that cell phones increase the risk of cancer while also extending survival? I would say no.

Given the sheer number of statistical tests performed one must be prepared to find some false positives and this is very likely what happened here. Paradoxically, though many had trouble calling the brain cancer and cardiac schwannoma finding spurious, they were more than willing to attribute the increased survival to the play of chance. Remember that rats exposed to cell phones lived longer then controls, probably because the control group rats developed more kidney disease for some unknown reason. No one truly believes that cell phones will extend survival, but people’s preconceived notions allowed them to dismiss some “statistically significant” results and not others.

It is also worth remembering that the amount of cell phone exposure given to these rodents (whole body irradiation 9 hours a day for 2 years) is unlike anything you will experience as a human being. I do not think that this study should affect your life in any way. Although, perhaps John Bucher, the NTP researcher I quoted earlier, put it most succinctly in an interview with Reuters.

Asked what the public should take from the study, Bucher said, “I wouldn’t change my behavior based on these studies, and I haven’t.”

If Bucher isn’t going to change his behavior, then you probably shouldn’t either. The only suggestion I would make about cell phone is to stop using them while you drive since thousands of people die every year due to distracted driving. Like Counsel’s student who rolled all those sixes, we are remarkably unwilling to admit that sometimes our results are the product of random chance. The link between cell phones and cancer in the NTP study is very likely due to random chance. The link between cell phones and traffic accidents is very real.

Posted by Christopher Labos

Dr. Christopher Labos MD CM MSc FRCPC is a physician with a Royal College certification in cardiology. After his clinical training at McGill University he pursued a master’s degree in epidemiology and biostatistics in order to follow a career in academic research. His main research focus is cardiovascular prevention. He realizes that half of his research findings will be disproved in five years: he just doesn’t know which half. He is also an associate with the McGill Office for Science and Society whose mission is to promote critical thinking and present science to the public. He co-hosts a podcast called The Body of Evidence. He is a freelance contributor for the Montreal Gazette, CJAD, and has also appeared on CBC Radio and CBC Television. To date, no one has recognized him on the street.