Before the pandemic, acupuncture was a frequent topic on this blog because it was a perfect example of what we at SBM mean when we distinguish between evidence-based medicine (EBM) and science-based medicine (SBM). In evidence-based medicine, randomized controlled clinical trials (RCTs) and meta-analyses of RCTs are at the very top of the pyramid of evidence. In general, this is a reasonable pyramid of evidence with a caveat: The treatments being tested need to have biological plausibility as well as preclinical (cell culture and animal studies) and early clinical evidence, such as small preliminary clinical trials, to justify RCTs. Where this concept breaks down is for what is now known as “integrative medicine“, having been rebranded from its previous name, “complementary and alternative medicine” (CAM). The reason is that the treatments being “integrated” into “integrative medicine” by and large tend to be treatments that have very low biological plausibility and/or are based on prescientific mysticism and belief, such as homeopathy, reiki and other forms of “energy medicine“, and, yes, acupuncture, particularly for conditions for which the primary endpoints are subjective and therefore very prone to placebo effects, such as, of course, pain. For such conditions, the clinical trial results are nearly always equivocal, even when positive, and tend to reflect more the noise and bias inherent in even well-designed RCTs than any actual positive results. As I like to ask about, for example, homeopathy, what is more likely, that a remedy diluted to nonexistence—in other words, water—has clinically relevant therapeutic effects or that a noise- and bias-laden clinical trial came up with a false positive result? You can ask the same thing about reiki and “energy healing”.
This is why, before the pandemic, we at SBM used to argue routinely for a more Bayesian approach that takes into account prior plausibility and probability in analyzing clinical trial results. Basically, prior plausibility informs how likely a “positive” result of a clinical trial is likely to be truly positive, the lower the prior plausibility the lower the required p-value. That is, of course a simplistic discussion, but it is in essence what we argue—and have argued since the very beginning of this blog.
Implausible treatments…like acupuncture
Unsurprisingly, acupuncture advocates like to try to argue that acupuncture is different, that it’s not like the faith healing of reiki or the sympathetic magic of homeopathy because it’s a physical intervention. The acupuncturist is sticking needles into the patient, after all! Unfortunately, though, all the concepts behind acupuncture are based on prescientific Asian mysticism, particularly the meridians through which qi (or life energy) flows and that the acupuncture needles supposedly impact. Moreover, modern acupuncture is nothing like ancient acupuncture, having been reinvented with the development of thin, filiform needles in the 1930s and then its entire history was retconned by Chairman Mao in the 1950s, along with the rest of traditional Chinese medicine. In brief, the narrative became that this “ancient treatment” demonstrates a near-miraculous efficacy for, well, just about everything. Never mind that the filiform needles associated with modern acupuncture didn’t exist until around 1930, when a Chinese pediatrician named Cheng Dan’an proposed that needling therapy be resurrected because its actions could be explained through neurology and replaced the previously-used coarse lancet-like needles with the fine needles in use today. Of late, during the pandemic, I have pointed out how the lessons of acupuncture trials should have been applied to hydroxychloroquine and ivermectin, repurposed drugs touted as highly effective treatments for COVID-19 based on in vitro activity against SARS-CoV-2, the virus that causes COVID-19, in cell culture. The problem was that the concentrations required were many times higher than what could be safely achieved in human blood, which presented a serious plausibility problem on the basis of pharmacodynamics and pharmacokinetics alone. That’s why I was not surprised when RCTs ultimately demonstrated that neither drug works against COVID-19, leading me to call ivermectin the “acupuncture of COVID-19 treatments“, because it was always a highly implausible treatment that nonetheless was subjected to way more RCTs than its preclinical evidence justified.
All of this brings us to an acupuncture study published in JAMA that I discussed four years ago that tested whether acupuncture could alleviate joint pain caused by a class of estrogen-blocking drugs known as aromatase inhibitors. I called it “spinning another essentially negative study“, particularly based on interviews with one of the main authors in which it was spun that way. Over the weekend, a reader sent me a link to another acupuncture study, published on Friday, this time in JAMA Open Network and titled “Comparison of Acupuncture vs Sham Acupuncture or Waiting List Control in the Treatment of Aromatase Inhibitor–Related Joint Pain“. I immediately thought: This looks very, very familiar. So I did a search for acupuncture and aromatase inhibitors on SBM, and I found my post from 2018. Looking at the author list, I saw the same names. Looking at the ClinicalTrials.gov number, I saw that it was the same, NCT01535066. It was the same study! So what was the difference? Basically, this was a long-term follow-up that claimed that acupuncture could result in long-term relief of the joint pain symptoms commonly associated with aromatase inhibitors like anastrozole.
Aromatase inhibitors and arthralgias
Before I move on to discuss the study design, first let me discuss the problem for which acupuncture was being studied. Given that over two-thirds of breast cancers have the estrogen receptor, that means that they are estrogen-responsive and blocking estrogen can inhibit breast cancer growth. Consequently, there are a lot of patients undergoing treatment with estrogen-blocking drugs. Unfortunately, these drugs can often have a serious deleterious effect on a woman’s quality of life, causing menopausal symptoms in premenopausal women and worsening menopausal symptoms in post-menopausal women. These symptoms, particularly hot flashes, are very hard to treat with anything other than estrogen. Indeed, I still remember the first patient I ever operated on as a new attending and how she simply could not tolerate her Tamoxifen, no matter how much her oncologist tried to work with her to make it possible for her to take it. The seriousness of the effects of these drugs on patient quality of life should not be underestimated.
One significant potential complication associated with aromatase inhibitors (almost always the first choice of antiestrogen drug for postmenopausal women) is osteoporosis. However, the most vexing side effect for most women on these drugs is severe musculoskeletal pain, specifically arthralgia (joint pain). The mechanism causing this joint pain is poorly understood, but the side effect can be debilitating enough that the only choice is to stop the medication and try another class of drugs. As a result, it would be highly useful to be able to alleviate the joint pain that aromatase inhibitors can cause, given that aromatase inhibitors are associated with a large decrease in the risk of recurrence after successful initial treatment of breast cancer. So one can understand why acupuncture advocates would want to test acupuncture on women undergoing aromatase inhibitor therapy.
As I like to put it, we have a medical problem whose solution we haven’t been able to identify yet; so let’s try magic. Why not?
As I perused the list of authors, now as in 2018, one name jumped out right away: Heather Greenlee. You might recall that she is a naturopath at the Fred Hutchinson Cancer Center at the University of Washington in Seattle. We’ve met her before, for example, when the guidelines for the “integrative oncology” care of breast cancer patients published by the Society for Integrative Oncology (SIO) were endorsed by the American Society of Clinical Oncology (ASCO). Also, now as in 2018, I was also disappointed to see that the University of Michigan, my medical alma mater, was one of the sites at which this study was carried out. Of course, U. of M. has gone all-in with “integrative medicine”, even to the point of embracing anthroposophic medicine, acupuncture, homeopathy, and naturopathy, with its family medicine department including a naturopath. So I guess that I shouldn’t have been surprised.
The study involved eleven academic medical centers and clinical sites, which recruited: Women with early stage ER(+) breast cancer who scored at least 3 on the Brief Pain Inventory Worst Pain (BPI-WP) item (score range, 0-10; higher scores indicate greater pain) and who were either (1) postmenopausal and taking an aromatase inhibitor or (2) pre- or perimenopausal whose ovarian function had been suppressed with a gonadotropin-releasing hormone agonist and were taking an aromatase inhibitor. Exclusion criteria included prior acupuncture treatment for joint symptoms at any time; history of bone fracture or surgery of the affected joints within 6 months prior to enrollment; severe bleeding disorder; or a latex allergy. Also, study participants must not have received opioid analgesics, topical analgesics, oral corticosteroids, intramuscular corticosteroids, or intra-articular steroids, or any other medical therapy, alternative therapy, or physical therapy for the treatment of joint pain or joint stiffness within 28 days prior to registration. The trial ran from March 2012 to February 2017 (final date of follow-up, September 5, 2017).
Patients were randomized in a 2:1:1 ratio to receive either true acupuncture, sham acupuncture, or to be on a wait list. As for the acupuncture used:
Study participants were randomized 2:1:1 to TA vs SA vs WC, with randomization dynamically balanced by study site. Both TA and SA consisted of twelve 30- to 45-minute sessions administered during 6 weeks (2 sessions per week) followed by 1 session per week for 6 more weeks. For TA, stainless steel, single-use, sterile, and disposable needles were used and inserted at traditional depths and angles. The SA protocol consisted of a core standardized prescription of minimally invasive, shallow needle insertion using thin and short needles at nonacupuncture points. The SA protocol also included joint-specific treatments and an auricular sham based on the application of adhesives to nonacupuncture points on the ear. The WC group received no acupuncture during the initial 24 weeks of study participation. At 24 weeks, all patients received vouchers for 10 TA sessions to be used before the 52-week visit.
The outcomes were:
The original protocol-specified primary end point was the BPI-WP score at 6 weeks. The short-form version of the BPI was administered at 6, 12, 16, 20, 24, and 52 weeks. For this long-term analysis, the primary end point was the 52-week assessment of BPI-WP (which was not previously reported), examined using multivariable linear regression. Secondary end points for this long-term evaluation included the BPI average pain, pain interference, pain severity, and worst stiffness scores at 52 weeks. All BPI scores range from no symptoms to worst on a 0- to 10-point scale (with higher scores indicating worse symptoms). In addition, we evaluated pain using the PROMIS Pain Interference–Short Form (PROMIS PI-SF), which has scores ranging from 6 to 30. This instrument was also administered at 6, 12, 24, and 52 weeks and has 5 response levels (low scores [not at all] to high scores [very much]), with higher scores reflecting worse symptoms.
We examined 2 functional measures: grip strength and Timed Get Up and Go test. Grip strength was measured with a digital hand grip strength dynamometer (DHS 88, Detecto) in kilograms. Patients were asked to make 3 maximal voluntary contractions with 1 minute between each. The maximum contraction was used. The Timed Get Up and Go test is a physical function assessment tool of speed that is an estimate of impairments in balance and gait.
It’s useful now to discuss the specific acupuncture interventions and controls. First, there’s nothing wrong with including a waitlist control; it’s a perfectly valid way to assess for placebo effects, but what about the sham intervention chosen? The first thing I noticed the first time around was that there was no blinding of the acupuncturists to intervention. Of course, not blinding the waitlist control is fine. That’s why there is a waitlist control in the first place, to estimate the magnitude of placebo effects. The important control is the sham acupuncture. At the time of the publication of the first study, the authors tried to justify not blinding the acupuncturists to the experimental group:
Still, Hershman knows critics will take issue with the sham treatments, which weren’t double-blinded. Practitioners knew they were giving fake treatments and may have sent subconscious signals to their patients.
Using retractable needles designed to blind practitioners to the sham wouldn’t have helped, Hershman said, because studies suggest that both patients and acupuncturists can tell the difference.
“We looked at a whole slew of studies…of different sham modalities,” Hershman said. “They all had advantages and disadvantages. We felt in looking at the collective literature out there that this was probably the best approach.”
As I said at the time, I did take issue with the sham treatments. It was simply not true that sham needles don’t work. I’ve lost track of the number of studies I’ve read where sham retractable needles were used and double-blinding was fine. My guess at the time was that the acupuncturists recruited to participate in the study didn’t want to use them, and I see nothing to reassess that guess. So right away, we knew all the crowing about the “rigorous design” of this trial is still a load of fetid dingos’ kidneys, especially given that, according to the 2018 study, subjects in the true acupuncture group were more likely to believe that they were receiving true acupuncture (68%) than those in the sham group (36%). I’ll still grant that the rest of the study was pretty well-designed, particularly given that a waitlist control was included, but the inadequate blinding, along with other factors, was enough then to explain the results of the study, and remains so.
To help explain why, I will reproduce the graph comparing adjusted mean group difference in Brief Pain Inventory Worst Pain Scores in true acupuncture (TA) versus sham acupuncture (SA) and TA vs waiting list control (WC) at 6, 12, 16, 20, 24, and 52 weeks:
So what do you think? Can you see what this result shows? Notice how, short term, there is a highly statistically significant difference in the pain score between the TA group and both the SA and WC control groups. There are two explanations for such a result. Either the study was unblinded, or there was an actual difference between TA and SA. As I argued above, I think the lack of blinding explains the result quite well. I also note that the authors had stated that they had hoped to see a difference of 2.0 or greater in the pain score, which further reinforced my conclusion that this result was due to the lack of blinding. Now look at the graph. None of the differences reported are greater than 2.0, and the only one that approaches 2.0 is the difference in pain scores at 12 weeks between TA and WC.
Notice also, how for the timepoints at 16, 20, and 24 weeks the differences between TA and SA were not statistically significant and were at most a half a point. As a commenter said after the last post:
After the fact, Dr. Hershman asserts (without citations) that differences in pain scores as low as 0.7 points are clinically significant. You must be f***ing kidding me. Okay, maybe on a scale of 0-2. The references I’ve seen say 1.5 – 2 points on a 0-10 scale, and this makes sense. Keep in mind, this is the MINIMUM clinically significant difference. That is, patients are willing to say that their pain was lessened a little bit. That’s different from “this treatment provided me with adequate pain relief.”
Exactly. The same applies now in the new study.
Admittedly, the final data point at 52 weeks shows both TA versus SA and TA versus WC having roughly 1.0 point difference that is statistically significant. Given the timeframe after the intervention (twelve 30- to 45-minute sessions of TA or SA administered during 6 weeks (2 sessions per week) followed by 1 session per week for 6 more weeks) and the fact that the WC group, although it received no TA or SA treatment during the first 24 weeks, did receive vouchers for acupuncture treatments afterward, which can muddy the analysis. Also, again, a mere one-point difference in pain scores has at best questionable clinical significance.
Now here’s the most important issue that I have with the study, having thought about it some more. What is the reason that we want to alleviate the symptoms that can be caused by aromatase inhibitors? True, we don’t want women to suffer unnecessarily, but we also want them to stick with the treatment for at least five years, something they are less likely to do if they are having severe joint pain. That brought up a question that I should have hammered home in my first post about this study: Why wasn’t one of the primary endpoints the discontinuation rate of aromatase inhibitor (AI)? Usually, patients who can’t tolerate AI therapy for their breast cancer will be forced to discontinue the drug within the first several months. In this study, what did the authors report? They reported that the “overall AI discontinuation rate within the 52 weeks of follow-up was 12.1% and did not differ by intervention group.” In other words, TA made zero difference in the only outcome that really matters for purposes of treating the patients’ breast cancer, their continuation of their AI therapy.
The authors then looked at pain medication use among patients who reported taking no pain medication at the start of the study:
To examine incident use of pain medications during the study, we also evaluated patterns among patients who did not report pain medication use at baseline (n = 91 [40.3%]). Among these patients, new use of pain medications during the study occurred in 20 of 44 patients (45.5%) in the TA group, 16 of 23 (69.6%) in the SA group (P = .06), and 16 of 24 (66.7%) in the WC group (P = .09). Taken together, pain medication use was less likely for patients in the TA compared with the SA or WC groups combined (20 of 44 [45.5%] vs 32 of 47 [68.1%], P = .03).
In other words, there was no statistically significant difference in the rate of initiation of pain medications between TA and SA; so the authors combined the SA and WC group to compare to the TA group and found a significant difference. That’s a bit dodgy to me right there.
No differences were observed between groups in assessments of grip strength (eTables 3 and 4 in Supplement 2) or Timed Get Up and Go (eTables 5 and 6 in Supplement 2) for any assessment time compared with baseline.
These are functional measures that are objective, and acupuncture did not affect them at all. My conclusion: Four years later, with follow-up out to 52 weeks, this is still a negative acupuncture study. The authors, however, did not see it that way:
In this randomized clinical trial, we found that among postmenopausal women with early breast cancer who experienced AI-related arthralgias, a 12-week intervention of TA compared with SA or WC resulted in statistically significant sustained reduction in joint pain at 52 weeks. This study highlights the durability of the acupuncture response through 1 year, as well as the importance of having both SA and WC groups to fully evaluate the effect of the acupuncture intervention.
No. Just no. This study, although barely “positive” for some measurements, was, due to lack of blinding and the low effect size observed, functionally, clinically, and practically a negative study. TA didn’t result in any functional improvement. It didn’t decrease the percentage of patients who had to stop their AI therapy. It only barely showed any improvement in a subjective pain score, and even then that effect varied wildly at different time points and was at most time points not statistically significant. (I didn’t even get into the uneven subject allocation to groups, something I discussed in my 2018 post.) Because of the lack of blinding and the very low prior scientific plausibility of acupuncture as a treatment for AI-induced joint pain, the SBM conclusion is that acupuncture didn’t work in this study for the outcomes studied. In contrast, EBM ignores these considerations, so that the analysis of this study left room to conclude that acupuncture does work against AI-induced joint pain or is at least promising.
Science-based medicine vs. evidence-based medicine
Before the pandemic, a not-infrequent criticism of our discussions of acupuncture studies like these tended to be based on methodolatry. This is a term that I learned in 2009 from an eminent epidemiologist defined as the profane worship of the RCT as the only valid method of clinical investigation. Let’s just say that there is a lot of methodolatry in EBM, which is how a treatment represented as ancient that is really a reinvention less than a century ago of a significantly different ancient treatment whose history has been heavily retconned can even reach the stage of being subjected to an RCT despite its extreme biological implausibility and its basis in ancient vitalist beliefs. It would be one thing if any well-designed RCT of acupuncture were ever to show a large, undeniable effect size against a clinical endpoint of interest that can’t be explained by placebo effects. Then one might start to ask whether we were wrong and there actually is a biological basis for acupuncture to work for that condition. Such is not the case, however, and so we continue to do study after useless study testing whether the magic of acupuncture works against condition after condition, finding equivocal results along the way because clinical trials are messy, and it’s not uncommon to find marginally statistically significant results if you do enough of them. Given the extreme biological implausibility of a modality like acupuncture for arthralgias due to AI treatment, in light of findings like this the SBM conclusion is that acupuncture doesn’t work.
In the age of the pandemic, the difference between EBM and SBM is even more salient. The perfect example is ivermectin for COVID-19. As I wrote many moons ago, SBM mavens all immediately realized that, even if you accepted the rationale proposed at face value, ivermectin was incredibly implausible as an effective therapy for COVID-19 just based on the in vitro data alone. It was basic pharmacology. A drug (like ivermectin) that only inhibits the target protein at a concentration that is at least nearly 70-fold higher than the highest blood concentration of drug that can be safely achieved using standard dosing is incredibly unlikely to be an effective treatment. It’s also a general principle that most highly effective drugs inhibit their target at nanomolar or ng/ml concentrations, not micromolar or μg/ml concentrations. Candidate drugs that only inhibit their target at such high concentrations, in general, tend to be incredibly unlikely to be useful drugs. Surprise! When higher quality RCTs were completed (compared to a number of low quality RCTs that showed an effect of ivermectin on COVID-19), it turned out that ivermectin is ineffective against the disease.
My hope now, as it was the last time I wrote about ivermectin, is that SBM thinking will be more likely to sway EBM adherents who don’t really take into account prior plausibility in evaluating RCT evidence and therefore take much longer to reach a conclusion that is obvious to SBM about a treatment, in the case of ivermectin that it doesn’t work against COVID-19. Unfortunately, the example of acupuncture tells me that it’s unlikely that this will ever happen, given that decades after it was apparent that acupuncture is nothing more than a theatrical placebo, studies like this one are still being carried out and reported as positive when, for all practical purposes, they are not.