A quick lesson on type-II errors (false negatives)

David Tredinnick, MP for Bosworth and staunch advocate of alternative therapies (such as homeopathy) in the House of Commons, is at it again. He has tabled three Early Day Motions proposing that the House welcome the findings of three separate trials of homeopathy that report “positive” results. One of them (a particularly nasty one since it relates to breast cancer, a very serious and life-threatening disease) has already received a proper fisking. One of them is so laughably easy to debunk right from the abstract that I’m going to do so here. (I haven’t read the third yet, but I would be surprised if it’s not similarly nonsensical).

The glaring howler in the paper, Homeopathic Individualized Q-potencies versus Fluoxetine for Moderate to Severe Depression: Double-blind, Randomized Non-inferiority Trial, by U.C. Adler and colleagues at the Faculdade de Medicina de Jundiaí, Homeopathy Graduation Programme, Department of Psychobiology [what's that when it's at home?], Universidade Federal de São Paulo, São Paulo, Brazil, is statistical, and leaps out at the reader right from the abstract.

The authors conducted what is known as a non-inferiority trial. In other words, instead of trying to show that treatment X is superior to treatment Y (or placebo, if it’s a placebo controlled trial), which is the usual course of action, they try to show that treatment X isn’t worse than treatment Y, at least not by a pre-determined margin. These trials are only used when it is ethically difficult to conduct a regular trial, and have many weaknesses, which are detailed here. Funnily enough, this critical appraisal of non-inferiority trials is not cited in Adler et al’s paper. Whoops.

So what is the problem? Well, usually, you take a sample of people, randomise them into two groups, give one group the treatment you’re testing, give one group the comparitive treatment, get the results, and determine whether the results you got could have occurred by chance or whether the results are extreme enough to conclude that your treatment had a greater effect. In this type of study, a superiority trial, two types of error can be made:

  1. you conclude that there is a difference when in fact there isn’t (false positive)
  2. you conclude that the two treatments are the same when in fact there is, and your study contained too few subjects to actually detect that difference (false negative)

However, since this is a non-inferiority trial, rather than a regular superiority trial, everything is reversed, because a “successful” trial is one where no significant difference is detected. So here, a type-1 error will lead to a falsely negative conclusion, and a type-2 error will lead to a falsely positive one.

So what’s happened here? Well, the main problem is that only 91 patients are included in the study. That’s a tiny number (though David Tredinnick appears to think otherwise). If this were a regular superiority trial, we would say that the study is underpowered, i.e. if there is a difference between the two groups, the error margins that you put around the statistics you get from the trial are so wide that it’s highly likely that these error margins (more formally known as confidence intervals, or CIs) will overlap, and you don’t get a significant difference. That same principle holds here in the non-inferiority trial, only this time, the trial is erroneously deemed to be “successful” rather than unsuccessful.

The numbers quoted in the abstract should cause alarm bells to ring. Here are the figures:

Page 1 of 2 | Next page