You don’t understand non-inferiority trials (and neither do I)

You don't understand non-inferiority trials First10EM

Over the last few years, I have seen a steady increase in the number of non-inferiority trials being published. This makes some sense, as they generally require fewer participants, and are therefore cheaper and easier to run. However, it presents a problem, as most of us (including myself) don’t really understand the statistics being performed, and often a non-inferiority design is inappropriate for the question being asked. Although it will be a little bit nerdy, I think it is time we all try to understand what these non-inferiority trials really mean, and why the design should probably be used less. 

Non-inferior does not mean equivalent

The fundamental design of a noninferiority trial requires the selection of a threshold below which experimental treatment would be considered inferior. To be considered ‘non-inferior’, the 95% confidence interval of the experimental treatment cannot fall below this threshold.

Despite their name, non-inferiority trials don’t actually attempt to demonstrate that one treatment is non-inferior to another. At least, not in the way that we commonly use those words. The goal of a non-inferiority trial is not to demonstrate that two treatments are equivalent, but rather to show that one treatment is not much worse than another.

The obvious problem with this trial design is that this non-inferiority boundary is chosen arbitrarily by the researcher.

The assumption of benefit

The justification for non-inferiority trials is always that there is some other untested benefit that justifies the new treatment, which is why we are willing to accept a trial demonstrating it is not much worse than the current standard of care. It can be a lower cost, or fewer adverse events, or an easier dosing regimen, but the inherent assumption of all non-inferiority trials is that the newer treatment is better (in a way not studied in the trial), and so should be used as long as we show that it is otherwise not much worse than the existing standard of care. 

Think about that for a moment. The justification for non-inferiority trials contains a claim of superiority. However, that claim for superiority is never tested; it is just assumed. Why are we willing to accept this unproven assumption, while simultaneously accepting the potential worse outcomes that non-inferiority trials entail?

Sometimes the assumption is pretty easy to prove. Cost is probably the easiest example. (That being said, even cost analyses become somewhat complex if the ‘non-inferior’ treatment results in worse outcomes and more long term complications.) I don’t think we need to run a superiority RCT to demonstrate a treatment is cheaper. However, I think we need to be very explicit about the conclusions of our non-inferiority trials. They should read: “we are willing to accept up to X% worse outcomes in patients to [save Y dollars] or [make the treatment regimen simpler].” That is what non-inferiority trials are designed to show. I suspect that if we framed the results this way, everyone would be a lot less excited about the outcomes of non-inferiority trials. 

Other assumed benefits are more complex, and in my mind almost always warrant standard superiority RCTs. You think your new treatment is superior to the current treatment because it will have less toxicity? Prove it. Run the RCT and show that patient oriented outcomes are better. If you can’t show that, then your drug isn’t any better, and there is no reason to use it over the established therapy. 

Noninferiority trials make assumptions about current therapies

The design of noninferiority trials results in many other assumptions that can skew interpretations of research. Because noninferiority trials require a ‘standard care’ treatment arm against which the novel therapy can be tested, these trials frequently assume that standard therapy works, or is better than placebo. Unfortunately, standard care is often not based on good evidence, and what is considered standard care often varies dramatically by geography or specialty. The trial might be designed to ask “is X noninferior to Y” when a more appropriate question might be to ask “is Y noninferior to X”. That research decision fundamentally biases the research and subsequent decisions.

One example of this in the real world was the non-inferiority RCT of conservative management for spontaneous pneumothorax. (Brown 2020) In that trial, they asked whether conservative management was noninferior to chest tube. Based on the primary outcome and noninferiority margin they selected, they were unable to demonstrate non-inferiority. However, I would ask why chest tubes were considered ‘standard of care’? There was never any evidence of benefit from chest tubes in this population, and there is a lot of observational data suggesting patients do just fine with conservative management. My practice was already to treat these patients conservatively. If the researchers had flipped this question around and asked “are chest tubes superior to, or at least non-inferior to, conservative management”, the results of this trial would likely have been interpreted very differently (with pretty clear harm from chest tubes). 

Every non-inferiority trial contains an assumption about the standard of care, and that assumption adds bias and complicates interpretation. This is especially true when you consider the potential scenario in which the intervention chosen as the ‘standard care’ is actually no better than placebo, or when the benefit from ‘standard care’ is so small that the non inferiority margin is bigger than the benefit and therefore crosses into patient harm. 

A conclusion of non-inferiority simply means that the 95% confidence interval did not pass the researchers’ arbitrarily chosen non-inferiority margin. However, this can result in the bizarre scenario in which a treatment is statistically significantly inferior, but the research still conclude it is “noninferior”. (See example 4 in the figure below). For example, if I decided to set my noninferiority margin at 10%, I could run a large trial and prove that treatment B is 5% worse than treatment A, but still publish my trial with the conclusion that treatment B is “noninferior” to treatment A. (This is not just theoretical, but actually occurs. In this edition of the research round up, I covered two different papers that were non-inferior despite showing a statistically significant difference between the groups.) (Aberegg 2018)

Scenario number 7 in the above graph is also pretty ridiculous. The official guidance is to label the result of this non-inferiorty trial as “inconclusive”, because the 95% confidence interval crosses the chosen noninferiority threshold, when the experimental treatment is clearly inferior.

In fact, this scenario can become even more absurd. It is relatively easy to design a non-inferiority trial which concludes that a treatment is non-inferior to standard care, when in fact that treatment is worse than placebo. Imagine we have a standard treatment that provides a real but very small benefit (picture statins in lower risk patients). Let’s say there is a 1% benefit from the standard therapy. I could invent a novel therapy and run a non-inferiority trial with a 5% non-inferiority margin, and prove that my new therapy was “non-inferior” to standard care, when in fact the novel therapy is actually hurting patients (it would have been harmful if compared to placebo, if I had included a placebo group). Although choosing margins that could be worse than placebo is frowned upon by the FDA, the considerable debate about the true benefit of many of our current therapies means that this is a real possibility.

This becomes more complicated when existing treatments were approved based on composite outcomes. Composite outcomes combine multiple individual outcomes to show larger benefits. If the non-inferiority trial subsequently looks only at an individual component of the composite, with the expected smaller difference, it is possible (and perhaps likely) that the larger composite would have been inferior.

We can make matters even worse if we consider that current therapies don’t have the same effect in all patient groups. An antibiotic might be incredibly effective against x-ray proven pneumonia, but have no effect at all in clinically diagnosed pneumonia (which is mostly viral). If you ran a non-inferiority trial using a placebo (which we know is worse), we would find that placebo is inferior to antibiotics in radiologically proven pneumonia. However, if instead we ran this trial in patients with clinically diagnosed pneumonia (a group of patients with primarily viral respiratory illnesses), we would actually conclude that placebo is non-inferior to the antibiotics. (We don’t run non-inferiority trials with placebo. The example is just to demonstrate how easy it is to manipulate the outcomes of non-inferiority trials.) You can imagine how this information might be gamed by companies with huge stakes in the results of these trials. (It is far easier to cheat in non-inferiority trials, because it is far easier to bias trials towards the null hypothesis than away from it. You should be exceptionally wary of non-inferiority trials run by companies or individuals with financial conflicts of interest.)

This issue will also get worse with time. If drug A is the existing standard, and drug B is shown to be ‘non-inferior’ to drug A, it is reasonable to think that over the next 5 years, drug B may become standard therapy. Then, when drug C comes to market, it is likely to be compared to drug B as the standard. Over a series of years, we could see a succession of ‘non-inferior’ drugs, each of which is marginally worse than the previous generation, resulting in therapy 20 years from now being significantly worse than the current standard. 

A lot of this is just a long and complicated way of saying that without a placebo group, there is just no way to know whether the novel therapy is truly better than placebo. 

Empirical evidence of problems with non-inferiority trials

One review of 162 noninferiority and equivalence trials found significant deviations from accepted good research practice, such as 80% of trials not providing a justification for the non inferiority margin being used, and 28% not accounting for the non inferiority margin in the sample size calculation. (Le Henanff 2006) Another review of 182 noninferiority trials in top rated journals found numerous problems, including the fact that about 12% of the time the experimental therapy was statistically worse than active control, but the CONSORT recommended conclusion for the trial was “noninferior”. (Aberegg 2018) This same study finds that an astonishing 77% of published non-inferiority trials make the claim of non-inferiority or superiority, as compared to only 2% that conclude that the novel therapy is inferior. If non-inferiority trials essentially never conclude that a treatment is inferior, that sounds a lot like there is significant bias, or there is a fundamental flaw in this trial design. (Prasad 2017) Unfortunately, the numbers get even worse when you focus on industry-sponsored trials, with 97% of such trials reaching favorable conclusions for their chosen treatment. (Flacco 2015)

Problems with critically appraising non-inferiority trials

“Trials to show superiority generally penalize the sloppy investigator… by contrast, non-inferiority trials tend to reward the careless. The less rigorously conducted the trial, the easier it can be to show non-inferiority.” (Schumi 2011)

In essence, a lot of the biases we learned for superiority trials work backwards for non-inferiority trials. In superiority trials, we are supposed to look at the intention to treat analyses, because per-protocol analyses might artificially magnify differences between the two groups. However, intention to treat analyses might artificially minimize differences between two groups (technically, both groups could take the exact same therapy and still be considered different by intention to treat), and therefore you are supposed to focus on per-protocol analyses for non-inferiority trials.

Bias is harder to detect in non-inferiority trials, and the results are more likely to be biased. A single non-inferiority trial should be given far less weight than a single superiority trial when assessing medical interventions.

A final, important problem faced when critically appraising non-inferiority trials is that, unlike superiority trials which contain all of the information required to appraise the trial, proper appraisal of a non-inferiority trial requires access to information outside of the trial in order to arrive at valid conclusions. (Al Deeb 2015) You need to know whether there is a true justification for the non-inferiority design (some other proven benefit). You need to know if the non-inferiority margin was appropriate, which generally means knowing the prior research looking at the current standard of care, and considering the 95% confidence intervals in those studies. You need to know if the effect of the standard therapy group was preserved, as compared to previous research. (If a drug demonstrated a 10% benefit in prior studies, but only demonstrates a 2% benefit in the non-inferiority trial, your conclusion of non-inferiority is probably not valid). The requirement for multiple sources of information outside of the published manuscript makes critical appraisal difficult (and also hints that we should have different requirements for the information contained in non-inferiority manuscripts).

Problems with the justifications for non-inferiority trials

One common justification for non-inferiority trials is that if patients don’t respond to current therapies, treatments approved through a non-inferiority pathway would provide them with an alternative. This logic is flawed. If a patient fails to respond to standard therapy, how do we know they will respond to the non-inferior (potentially worse) option? Why are we testing non-inferiority in the general population, and not in this specific sub-population? This is not a valid justification for non-inferiority trials. Instead, we should be running superiority RCTs to prove that the new therapy actually provides patient oriented value in the sub-population of interest (such as non-responders to current therapy).

Another common justification for non-inferiority trials is that new treatments are better tolerated or easier to use. But non-inferiority trials don’t test that hypothesis. And there is an important balance between this hypothetical benefit and the potential harm implied by non-inferiority margins. Why accept a non-inferiority trial design, when we can directly test the hypothesis? Run a superiority trial to clearly demonstrate that there are patient oriented benefits that outweigh the potential harms. 

It is argued that inferiority trials can be smaller and easier to run, but the size of the trial is set by the margin of error you are willing to accept. We often accept comically large non-inferiority margins, which does technically allow the trials to be smaller, but also means that we are exposing our patients to significant potential harm. To get non-inferiority margins down to a size that most of us would consider clinically acceptable, the trials will have to be just as big as (or perhaps bigger than) standard RCTs. (Even if the trials could be smaller, are we really willing to save a few research dollars at the expense of using new treatments that might be significantly worse than our current options?)

Non-inferiority as a secondary outcome

If we think a new therapy might be safer than the current standard, we need to prove that. We need to demonstrate, in a superiority RCT, that the newer therapy reduces adverse events. However, we are obviously equally interested in the new therapy maintaining its efficacy. Although I think non-inferiority is a bad approach to the primary outcome of a trial, it might be a reasonable secondary outcome. 

Equivalence trials

Non-inferiority trials are often interpreted as if they were equivalence trials. They are not. However, it is worth noting that many of the problems with non-inferiority trials also exist for equivalence trials. Generally, it is not possible to really prove equivalence. (You would need an infinitely large sample size to truly prove equivalence.) You have to have error bars, which means the new treatment could always be worse than the current standard. These error bars translate into real outcomes for patients. If a treatment might be as much as 5% worse than the current standard, that could translate into many thousands of excess deaths over time. 

In most cases, this will be a theoretical harm, and the error bars will also include the possibility that the new treatment is better. However, there is a very important distinction that needs to be considered when looking at equivalence trials. We accept a degree of uncertainty in superiority trials, because with a novel treatment there are no other options. When running an equivalence or non-inferiority trial, we already – by definition – have a treatment that works (or at least that we assume works). If we have a treatment that works, what is the rush to approve another one? Why do we need one that is equivalent or potentially worse? Why would we stop using the current standard therapy for a newer, less proven, more expensive option, that is at best equivalent? 

If we are going to adopt new treatments, they should be better than our current options. They need to be superior, not equivalent. The benefit doesn’t always need to be for the same outcome, but it certainly needs to be proven. Thus, I see very little role for equivalence trials. We need trials that demonstrate superiority in patient oriented outcomes. 

The ethics of non-inferiority

Some have argued that non-inferiority trials are unethical, and I might be inclined to agree in many if not most cases. (Garattini 2007) They argue that trials designed to allow us to accept therapies that may be worse than current care can’t be ethical. These trials are usually designed for commercial rather than clinical purposes. “We believe that non-inferiority studies have no ethical justification, since they do not offer any possible advantage to present and future patients, and they disregard patients’ interests in favour of commercial ones.” These authors believe that few patients would participate in such research if they really understood what they were being exposed to.

Conclusion

Despite spending a tremendous amount of time over the last few years, I still don’t really understand all the implications of non-inferiority trials. I think these trials are much more likely to be misleading, and I will be much more skeptical of the results than I am of superiority trials. Ideally, we should be seeing less of these trials, rather than more (but that will probably require changing the crazy system we currently have where we allow drug companies to test their own products and sell us with the results.)

Practical approach

At a systems level, it seems like we are massively overusing non-inferiority trials, and that needs to stop. That information does very little to help the practicing clinician.

If you are trying to appraise a non-inferiority trial, I think there are a few key questions to consider:

First, is the novel therapy clearly beneficial in some other way (cheaper, less invasive, easier to use, less toxic) that justifies that non-inferiority design? If not, a non-inferiority design is inappropriate, because you don’t want a new therapy that is just “not much worse” than the one you are currently using, and so you should ignore the trial completely.

If there is a clear advantage to the novel therapy, you next have to ask yourself exactly how much you are willing to lose in order to gain that benefit. This will depend a lot on the context and the outcome being studied, but a non-inferiority trial will only tell you a novel therapy is not much worse than the current standard. If the drug is easier to take, are you willing to accept 10% worse mortality? 5%? 1%? You need to think carefully about this threshold, and not just accept the number chosen by the researcher. 

Finally, I would suggest standardizing the language we use when discussing non-inferiority trials to better reflect what they are really demonstrating. They standard conclusion of a non-inferiority trial should be, “we demonstrated that intervention A is up to but no more than x% worse than intervention B.”

Other FOAMed

There is a good twitter thread on non-inferiority trials by Andrew Althouse that can be found here

Evidence based medicine is easy

The EBM bibliography

Evidence based medicine resources

EBM deep dives

References

Aberegg SK, Hersh AM, Samore MH. Empirical Consequences of Current Recommendations for the Design and Interpretation of Noninferiority Trials. J Gen Intern Med. 2018 Jan;33(1):88-96. doi: 10.1007/s11606-017-4161-4. Epub 2017 Sep 5. PMID: 28875400

Al Deeb M, Azad A, Barbic D. Critically appraising noninferiority randomized controlled trials: a primer for emergency physicians. CJEM. 2015 May;17(3):231-6. doi: 10.2310/8000.2014.141405. PMID: 26034906

Brown SGA, Ball EL, Perrin K, et al. Conservative versus Interventional Treatment for Spontaneous Pneumothorax. The New England journal of medicine. 2020; 382(5):405-415. PMID: 31995686

Flacco ME, Manzoli L, Boccia S, Capasso L, Aleksovska K, Rosso A, Scaioli G, De Vito C, Siliquini R, Villari P, Ioannidis JP. Head-to-head randomized trials are mostly industry sponsored and almost always favor the industry sponsor. J Clin Epidemiol. 2015 Jul;68(7):811-20. doi: 10.1016/j.jclinepi.2014.12.016. Epub 2015 Feb 7. PMID: 25748073

Garattini S, Bertele’ V. Non-inferiority trials are unethical because they disregard patients’ interests. Lancet. 2007 Dec 1;370(9602):1875-7. doi: 10.1016/S0140-6736(07)61604-3. PMID: 17959239

Le Henanff A, Giraudeau B, Baron G, Ravaud P. Quality of reporting of noninferiority and equivalence randomized trials. JAMA. 2006 Mar 8;295(10):1147-51. doi: 10.1001/jama.295.10.1147. PMID: 16522835

Prasad V. Non-Inferiority Trials in Medicine: Practice Changing or a Self-Fulfilling Prophecy? J Gen Intern Med. 2018 Jan;33(1):3-5. doi: 10.1007/s11606-017-4191-y. PMID: 28980180

Schumi J, Wittes JT. Through the looking glass: understanding non-inferiority. Trials. 2011 May 3;12:106. doi: 10.1186/1745-6215-12-106. PMID: 21539749

Walker E, Nowacki AS. Understanding equivalence and noninferiority testing. J Gen Intern Med. 2011 Feb;26(2):192-6. doi: 10.1007/s11606-010-1513-8. Epub 2010 Sep 21. PMID: 20857339

Photo by Markus Spiske on Unsplash

Leave a Reply

3 thoughts on “You don’t understand non-inferiority trials (and neither do I)”

Discover more from First10EM

Subscribe now to keep reading and get access to the full archive.

Continue reading