The most recent issue of The Journal of the American Medical Association (JAMA) published a study which concludes:

Among patients with migraine without aura, true acupuncture may be associated with long-term reduction in migraine recurrence compared with sham acupuncture or assigned to a waiting list.

In the same issue they published an invited commentary by Amy Gelfand which concluded:

However, this study does not convincingly demonstrate acupuncture’s efficacy for migraine prevention.

Which conclusion do you think was more widely published in the media?

How do we know what works?

The meta-question here, one which we are primarily concerned with here at SBM, is how do we know when a treatment works? I think we have built a robust case over the years that we need a fairly high bar of scientific rigor before we can make reliable conclusions about efficacy, especially when dealing with subjective symptoms.

The case for a high bar of evidence includes the fact that most preliminary studies are wrong in the false positive direction. There is a reproducibility problem in the biomedical sciences. P-hacking is prevalent, combined with an over-reliance on p-values and frequentist statistical analysis.

We can further take a look at areas where we have a large body of clinical research looking into a treatment that we can be highly confident does not work. The poster-child for this scenario is homeopathy. Homeopathy cannot possibly work, and when you systematically review the totality of clinical research it shows that homeopathy does not work. We can then flip the analysis – now that we can say with the highest degree of scientific confidence that homeopathy does not work for anything, what does the clinical literature look like? That is what the medical literature looks like when you study a treatment that does not work.

What we see are a lot of flawed preliminary studies with mixed results but with a bias toward positive outcomes. The more rigorous the study design, the smaller the effect size and less likely the study is to be positive (a phenomenon dubbed “the decline effect”). The most rigorous studies are consistently negative. In other words, we only get reliable results when we do clinical studies of high scientific rigor. Preliminary studies are all over the place and mostly wrong.

The next obvious question is – why ever do preliminary studies? The good answer is that preliminary studies help assess safety and plausibility prior to investing in large rigorous studies. Further, they help in the design of rigorous studies by identifying potential flaws and confounding factors. The bad answer is – because they are cheaper, easier, and quicker to conduct. The horrible answer is – because they produce the results you want.

The JAMA Acupuncture Migraine Study

Zhao et al. performed this study of acupuncture for migraine prevention in several locations in China, including acupuncture and TCM hospitals. There are three treatment groups in the study, true acupuncture, sham acupuncture, and wait-list no intervention. They found:

The mean (SD) change in frequency of migraine attacks differed significantly among the 3 groups at 16 weeks after randomization (P < .001); the mean (SD) frequency of attacks decreased in the true acupuncture group by 3.2 (2.1), in the sham acupuncture group by 2.1 (2.5), and the waiting-list group by 1.4 (2.5);

Not mentioned in the abstract, but contained in the main article itself, is the fact that acute medication use frequency did not differ among the groups. There are a number of fatal flaws in this study which led to Dr. Gelfand’s correct conclusion that this study does not convincingly show efficacy.

While the p-value was highly significant, the effect size was very small – a difference of one migraine attack over four months (I would not consider that a successful treatment). That combination, significant p-value but with small effect size, is always a red flag. Essentially, the p-value does not make up for the small effect size (that is the frequentist delusion). This is because subtle systematic errors in the study can produce consistent results that are biased in one direction. P-values don’t detect the bias, they just measure the consistency.

Not only is the effect size small, but the more objective outcome measure, use of acute migraine medication, showed no difference. Subjects may tell you their headaches are better, but if they are still using the same amount of pain medication to treat them they probably aren’t. This means that, despite all the other fatal flaws in this study, the results themselves are not compelling.

The comparison to the no-intervention group is of no clinical significance. An unblinded no-intervention group is included only for calibration, to show that the study is capable of measuring a difference, and to show baseline placebo effects, such as regression to the mean. The comparison cannot properly be used to support any efficacy claims.

There are much greater flaws, the biggest probably being that the study was only single-blinded. The subjects were blinded, but the acupuncturists were not. We know from prior acupuncture studies that the interaction between the acupuncturist and the patient is the most significant (and perhaps only) variable that influences outcome. In fact, unblinding of subjects is a chronic problem plaguing the acupuncture literature. This is a known issue, and therefore any study that does not properly double-blind everyone involved, and asses that blinding, is of limited use.

This fact alone – poor blinding – easily explains the small effect size in subjective outcome in this study.

This was also not a study of acupuncture, but of “electroacupuncture.” In my opinion, electroacupuncture (using electrical stimulation through the acupuncture needles) is not a real thing. It is just transdermal electrical stimulation through needles that the researchers are calling acupuncture needles. Electrical stimulation is an intervention being researched for migraines. I cannot say it has proven efficacy, but it is far more plausible than acupuncture.

This means that the study mixed variables in a way that cannot later be teased apart. While I do not think this study showed convincing efficacy, any effect observed could have been entirely due to the electrical stimulation. The “acupuncture” was as much a part of the overall effect as a Danish is a part of “this nutritious breakfast”.

Finally, it must be pointed out that this study was conducted in China. This is relevant because systematic reviews have shown that essentially 100% of acupuncture studies conducted in China show a positive outcome in their primary outcome measure. Edzard Ernst has this to say about such results:

The question why all Chinese acupuncture trials are positive has puzzled me since many years, and I have quizzed numerous Chinese colleagues why this might be so. The answer I received was uniformly that it would be very offensive for Chinese researchers to conceive a study that does not confirm the views held by their peers. In other words, acupuncture research in China is conducted to confirm the prior assumption that this treatment is effective. It seems obvious that this is an abuse of science which must cause confusion.

This essentially means that acupuncture trials out of China are worthless. Because they are 100% positive they have no predictive value. Keep in mind, even treatments that work do not produce 100% positive studies, which would be statistically highly improbable, and is therefore a marker for bias.

In summary, this study shows very weak results, with the most objective outcome measure being negative, using fatally-flawed methodology, and showing signs of significant bias. The results are unreliable even before you consider that the hypothesis is highly implausible, and that prior research has convincingly shown that acupuncture does not work. The study has apparently served its primary purpose, however – to generate a round of positive press for acupuncture and further the myth that acupuncture is evidence-based.

Acupuncture for migraine

What about other research into acupuncture for migraine prevention? This is where there seems to be a profound effect from acupuncture’s positive PR. Multiple systematic reviews have concluded that the evidence suggests a positive role for acupuncture in the management of migraine and other headache types. When you dive into the details, however, a very different picture emerges.

A 2009 Cochrane systematic review is representative:

Fourteen trials compared a ‘true’ acupuncture intervention with a variety of sham interventions. Pooled analyses did not show a statistically significant superiority for true acupuncture for any outcome in any of the time windows, but the results of single trials varied considerably.

Even taken at face value, the research does not show that true acupuncture is superior to sham acupuncture. Acupuncture is superior to no treatment, which as I stated above tells us nothing.

The results are further questionable, however, when you consider two significant confounding factors. The systematic reviews contain studies that use electroacupuncture, which is a confounding variable that makes the result uninterpretable. Further, the reviews include studies from China. We know from reviews that acupuncture studies from China are guaranteed to be positive, which means there are certainly false-positive studies in the mix. Therefore any systematic review or meta-analysis that incorporates data from China is contaminated and the results are not reliable.

Conclusion: Acupuncture still doesn’t work

The plausibility of acupuncture is extremely low, even for the indications for which it is mostly advocated (subjective symptoms like pain). The overall literature on acupuncture is also negative. What the studies show is that it does not matter where you stick the needles, if you even stick needles through the skin, or if you elicit the dequi sensation which is supposed to be necessary for an effect. The training of the acupuncturist does not matter. The evidence shows that acupuncture points do not exist, and they have no plausible basis in anatomy, physiology, or any aspect of biology.

The clinical research on acupuncture is a mess. Most studies have major design flaws, mix variables, have poor blinding, and variable outcomes. The overall patterns in the acupuncture literature mirror those of the homeopathy literature, meaning that it is consistent with a treatment that does not work.

Despite this negative evidence and lack of plausibility, acupuncture enjoys perhaps the best public relations among alternative medicine treatments. The flow of poor quality and biased research, reviewed largely by proponents, has managed to convince the unwary that acupuncture works.

While I applaud JAMA for including an invited commentary with this latest acupuncture study, they have, I think, overall done a disservice by publishing yet another crappy acupuncture study that will simply lend to the pervasive myth that acupuncture works.


Posted by Steven Novella

Founder and currently Executive Editor of Science-Based Medicine Steven Novella, MD is an academic clinical neurologist at the Yale University School of Medicine. He is also the host and producer of the popular weekly science podcast, The Skeptics’ Guide to the Universe, and the author of the NeuroLogicaBlog, a daily blog that covers news and issues in neuroscience, but also general science, scientific skepticism, philosophy of science, critical thinking, and the intersection of science with the media and society. Dr. Novella also has produced two courses with The Great Courses, and published a book on critical thinking - also called The Skeptics Guide to the Universe.