David Tredinnick, MP for Bosworth and staunch advocate of alternative therapies (such as homeopathy) in the House of Commons, is at it again. He has tabled three Early Day Motions proposing that the House welcome the findings of three separate trials of homeopathy that report “positive” results. One of them (a particularly nasty one since it relates to breast cancer, a very serious and life-threatening disease) has already received a proper fisking. One of them is so laughably easy to debunk right from the abstract that I’m going to do so here. (I haven’t read the third yet, but I would be surprised if it’s not similarly nonsensical).
The glaring howler in the paper, Homeopathic Individualized Q-potencies versus Fluoxetine for Moderate to Severe Depression: Double-blind, Randomized Non-inferiority Trial, by U.C. Adler and colleagues at the Faculdade de Medicina de Jundiaí, Homeopathy Graduation Programme, Department of Psychobiology [what's that when it's at home?], Universidade Federal de São Paulo, São Paulo, Brazil, is statistical, and leaps out at the reader right from the abstract.
The authors conducted what is known as a non-inferiority trial. In other words, instead of trying to show that treatment X is superior to treatment Y (or placebo, if it’s a placebo controlled trial), which is the usual course of action, they try to show that treatment X isn’t worse than treatment Y, at least not by a pre-determined margin. These trials are only used when it is ethically difficult to conduct a regular trial, and have many weaknesses, which are detailed here. Funnily enough, this critical appraisal of non-inferiority trials is not cited in Adler et al’s paper. Whoops.
So what is the problem? Well, usually, you take a sample of people, randomise them into two groups, give one group the treatment you’re testing, give one group the comparitive treatment, get the results, and determine whether the results you got could have occurred by chance or whether the results are extreme enough to conclude that your treatment had a greater effect. In this type of study, a superiority trial, two types of error can be made:
- you conclude that there is a difference when in fact there isn’t (false positive)
- you conclude that the two treatments are the same when in fact there is, and your study contained too few subjects to actually detect that difference (false negative)
However, since this is a non-inferiority trial, rather than a regular superiority trial, everything is reversed, because a “successful” trial is one where no significant difference is detected. So here, a type-1 error will lead to a falsely negative conclusion, and a type-2 error will lead to a falsely positive one.
So what’s happened here? Well, the main problem is that only 91 patients are included in the study. That’s a tiny number (though David Tredinnick appears to think otherwise). If this were a regular superiority trial, we would say that the study is underpowered, i.e. if there is a difference between the two groups, the error margins that you put around the statistics you get from the trial are so wide that it’s highly likely that these error margins (more formally known as confidence intervals, or CIs) will overlap, and you don’t get a significant difference. That same principle holds here in the non-inferiority trial, only this time, the trial is erroneously deemed to be “successful” rather than unsuccessful.
The numbers quoted in the abstract should cause alarm bells to ring. Here are the figures:
Non-inferiority of homeopathy was indicated because the upper limit of the confidence interval (CI) for mean difference in MADRS [the scale used in this study for measuring depression] change was less than the non-inferiority margin: mean differences (homeopathy–fluoxetine) were –3.04 (95% CI –6.95, 0.86) and –2.4 (95% CI –6.05, 0.77) at 4th and 8th week, respectively.
OK, so for there to be no “significant” difference, these confidence intervals should include zero (indicating no difference), which they do. But only just. Statistically speaking, the results are very much borderline, and given the tiny number of people involved in the study, it really is a leap of faith to conclude strongly that homeopathy is not inferior to fluoxetine. In fact, the mean differences appear to be quite large; with no discussion of the plausible range of MADRS scores, or what sort of difference in score constitutes an “improvement”, it’s very hard to tell. But if your error margins are as wide as the expected improvement, then of course you’re going to conclude that there’s no difference, simply because your error margins are too large to detect any.
In summary then, the article reads as follows: “we took a handful of people, gave one lot the standard drug, gave the other lot magic water, the standard drug seemed to work marginally better but because we took so few people, we can’t tell whether there’s any difference between the drug and the magic water, therefore the magic water is not inferior to the drug.” No shit.