I know many people are fed up with the debate about thrombolytics for acute ischemic stroke. (To be honest, I am as well. I wish we could just get the replication studies required to settle the issue, but opinions are so entrenched that the normal scientific process has been stifled.) However, even if you aren’t going to change your clinical practice at this point, discussion of the stroke literature can be tremendously valuable in understanding the principles of evidence based medicine. That is why, when I was invited to write an update on the evidence for the NNT website, I embraced the opportunity despite the controversy that I knew it would provoke.
This post is a response to 2 recent blog posts found on the excellent EMOttawa website that were written as a critique of our NNT article:
Neither of the posts actually mentions the NNT article, but Dr. Drew told me on Twitter that he was writing these posts in response. My discussion is based on that assumption, which if incorrect would make some of my comments about his article incorrect, but it doesn’t change the general science being discussed.
Assuming that Dr. Drew’s articles were a response to the NNT article, I don’t think that our position was well represented. The arguments being rebutted by Dr. Drew appear to be much weaker versions of the arguments we put forth suggesting uncertainty in this literature. In other words, they seem to represent a strawman. That might be hard for any author to judge, as we are all emotionally tied to our work, but I will let people read the EM Ottawa posts and our NNT article and judge for themselves. (Admittedly, the NNT article is a bare bones summary because of space limitations. Therefore, I may have failed to adequately describe the various weaknesses in this literature in that piece, and just assumed I have because I have described them in much more detail elsewhere. Ie here or here.)
Edit: Dr Drew has clarified that his articles was not a response to our NNT article, so that assumption was incorrect. Thus, the arguments he presents are not a misrepresentation of our discussion. However, they are still weaker versions of superficially similar arguments commonly made when discussing this evidence, and therefore function as strawman arguments.
I want to discuss the problems I see with Dr. Drew’s articles, but I think it is essential to be clear about the intent. I have never met Dr. Drew, but my online interactions with him lead me to believe that he is an excellent physician, and that we share the common goal of providing emergency patients with the best possible care. The goal of a scientific debate like this is not to “win”, but to elevate the arguments on both sides to their highest possible form, and ultimately to ensure that we are using the best possible information to guide patient care.
I don’t actually have a specific stance on thrombolytics. I am not trying to convince people how to practice. I have my bets, but I honestly don’t know what a replication of NINDS would show. Because I don’t think the science supports a solid conclusion in either direction, my goal in discussing this literature is not to sway anyone in their decision to prescribe thrombolytics. My only goal is to promote better critical appraisal skills and a better understanding of evidence based medicine (including for myself).
So let’s get to Dr. Drew’s articles:
I really like part 1 of Dr. Drew’s article. When there are questions about science, it is almost always best to go back and read the core literature for yourself. However, there are a couple minor issues, and one major issue with this article that I think we should address.
Bias is inevitable in critical appraisal. My personal biases are evident in every article that I write on First10EM. (I try to identify those biases as much as possible, and an eagerness to change in the face of new evidence helps mitigate bias to some extent, but it is always there.) Thus, when I say that I think there is some bias in Dr. Drew’s article, it is not meant as a personal attack. However, it doesn’t seem fair to refer to ATLANTIS B as “the FIRST negative tPA trial” immediately after discussing ECASS 1, ECASS 2, and NINDS 1, all of which were negative. Talking about trends towards benefit in those trials, with p values of 0.38 and 0.27, is just not a fair assessment. Although we shouldn’t dichotomize trials as positive or negative, in common EBM parlance, these trials were clearly negative.
However, the bigger problem with this article is that it doesn’t provide a comprehensive review of the stroke literature. The discussion is biased, because only a subset of trials is discussed.
As we point out in our NNT article, there is no scientific reason to focus only on alteplase and exclude everything else. As a general rule in medicine, we assess drugs as a class. It is actually the rare exception when one member of class turns out to be superior to another. To make that distinction, we need a prospective direct comparison, which doesn’t exist in the stroke literature.
The argument that alteplase is superior to other thrombolytics is based entirely on unplanned, retrospective analysis. It is common to hear physiologic arguments as to why alteplase might be superior, but if those arguments were so obvious, the brilliant researchers designing these trials would have never used alternatives such as streptokinase. The other stroke trials were run precisely because, at the time of their design, the different thrombolytics agents were not expected to be different. It is only in retrospect, based on the positive findings in two alteplase trials, that the theory of alteplase superiority was developed.
We understand these retrospective comparisons better when they occur in single studies. We know that subgroup analysis has a high probability of being wrong even when it is prospectively planned. (Wallach 2016) When researchers start adding unplanned, retrospective subgroup analyses after their data has been collected, we refer to this as “data-dredging” or “p-hacking”. The apparent superiority of alteplase is an association only, and it is an association only identified retrospectively.
Ignoring trials just because they are negative is not valid. It will bias your results. One may posit the hypothesis that alteplase is superior based on these retrospective observations, but that hypothesis still requires prospective validation. In the case of alteplase, because the hypothesis was generated retrospectively, it has a high probability of being incorrect. Furthermore, empirically speaking, this hypothesis was looked at by both the Cochrane review (Wardlaw 2014) and a second meta-analysis (Donaldson 2016), and neither analysis found statistical evidence of a difference in treatment effect between the different thrombolytic agents.
Despite these issues, I think the summary figure for part 1 of this article provides a reasonable illustration of the stroke literature. We have a mix of positive and negative trials. Each individual trial has its own limitations. It might be reasonable to interpret these trials as demonstrating a potential benefit, but there is clearly no certainty in this data.
The second part of this article is broken down into 4 arguments that are supposedly made against thrombolytics, and provides rebuttals to those arguments. In my mind, this is where the strawman seems to arise, as these are not arguments I have ever made, nor are they made in the NNT article that provoked this discussion. Let’s go through the 4 arguments:
Argument 1: “The Hoffman re-analysis invalidates the findings of the NINDS paper, as adjustment for baseline stroke severity leads to no difference as compared with placebo”
I don’t think anyone has based their critique of NINDS part 2 on the Hoffman re-analysis. (Hoffman 2009) The limitations of NINDS have been discussed since the day the trial was published, long before the Hoffman publication. The Hoffman paper is an interesting analysis, but is tangential to the many concerns about NINDS that arise from basic critical appraisal.
No study is perfect, and NINDS has more than its fair share of limitations. (NINDS 1995) The division of the trial into two parts is unusual, and without a prepublished protocol, it is impossible to know when that decision was made. They include multiple primary outcomes, again without a registered protocol, so researcher degree of freedom is high. There is the unfortunate baseline imbalance between the two groups, and the change from a relative primary outcome in part 1 (that would not have been influenced by that imbalance) to an absolute outcome in part 2 (that is potentially heavily influenced by that baseline imbalance) is problematic. Furthermore, the results of NINDS can’t be easily extrapolated, because of the requirement that half of the participants be enrolled within 90 minutes. (If you believe that “time is brain”, this clearly biases the results towards more benefit than we would see in real life patients.) There is a question of biologic plausibility: how can thrombolytics have no effect at 24 hours but a large benefit 3 months later? There is also the statistical fragility of the trial (which is especially problematic with increased researcher freedom due to lack of registration of trials at the time).
Trials always have some limitations, and those limitations do not necessarily negate the results. However, it is absolutely essential to consider the various sources of bias in any trial, and temper one’s certainty based on that critical appraisal. There are three possible explanations for any statistically significant finding: it could be chance, it could be from bias, or it could be a real result. When it comes to tPa in NINDS, all three options are clearly possible. A chance finding is always possible, and with the fragility of these findings and the various ways that this trial increased researcher degrees of freedom, it is certainly a possibility here. There are many sources of bias, such as the baseline imbalance between the groups, that could explain the findings. But it is also possible this is a real benefit. You can’t distinguish between those alternatives by simply reading the NINDS manuscript. The only way to determine the cause of the statistically significant result is to replicate the study. That is why replication is the core of the scientific process.
So bottom line: no one thinks the Hoffman re-analysis invalidates the findings of NINDS. However, like we routinely do in all areas of medicine, we recognize the many limitations of the NINDS trial that mean that more science is needed to know whether these results represent a true benefit that outweighs the known harms of thrombolytics. (This is so essential: there is absolutely nothing special about the thrombolytic literature. Replication is required in every area of medical research. This is a consistent mantra of evidence based medicine, not a unique demand when it comes to tPa.)
Argument 2: “Following adjustment for baseline characteristics in the NINDS trial the notion of the “TIME IS BRAIN” paradigm disappears”
Again, this is not actually the argument being made. The discussion about “time is brain” has nothing to do with any statistical adjustment. It is about understanding the strength of the data upon which “time is brain” is based, and the bias introduced when “time is brain” is used to hand pick only positive studies while ignoring the bulk of the thrombolytics literature.
Much like the focus on alteplase at the exclusion of all other thrombolytics, the “time is brain” hypothesis was generated retrospectively. We know that is true because the researchers of the 1990s prospectively designed RCTs enrolling patients up to 6 hours. They expected it to work. If the “time is brain” hypothesis was as obvious as many make it out to be now, these trials would have never been run. It was only after these trials were run, and subgroups were analyzed, that the hypothesis of a 3 hour window was generated. There is not prospective randomized data confirming this hypothesis. “Time is brain” is a reasonable hypothesis, but it is essential to recognize that it is a hypothesis and not a fact.
It might be reasonable to base clinical practice on hypotheses generated from subgroup analyses while we wait for confirmatory studies. However, we must be careful, because there is data that suggests that these subgroups are very rarely confirmed. (Wallach 2016) And we must remain aware that our practice is based only on a provisional hypothesis. Subgroups do not define scientific truth. Follow-up research is needed.
Even the empirical evidence that “time is brain” is pretty weak. Dr Drew does present one meta-analysis from 2010 that concludes there is an association between time to treatment and benefit. (Lees 2010) However, the 2014 Cochrane review looked at the same question and concluded that the current data does not support a significant difference in outcomes between the 0-3 and 3-6 hours groups. (Wardlaw 2014) Thus, even as a hypothesis, there is uncertainty about whether “time is brain”.
We could do a full critical appraisal of all these papers, and discuss the fact that the Lees 2010 paper was written by employees of Boehringer Ingelheim, but it actually doesn’t matter. It doesn’t matter if there is a statistically significant association between time to treatment and benefit, because it is still an association based on observational data. Even if true, it gives us a hypothesis, not a fact, and it is not appropriate to simply ignore a body of available literature based on an unproven hypothesis.
Bottom line: It is important to recognize that “time is brain” is only an association, not a proven fact. Using time to retrospectively exclude negative studies will only bias your interpretation. (It also isn’t necessary. Even when meta-analyses include all available studies, they find a statistical benefit, so “time is brain” is actually somewhat irrelevant to the discussion). (Donaldson 2014; Wardlaw 2014)
Argument 3: “The harms are too high, and outweigh any benefit”
I will admit that I have heard people say this, but it is certainly not an argument I have ever made, not one made in our NNT article. The argument is not about the magnitude of harms and benefits. In fact, I think people generally agree about the general numbers presented in these studies. The concern is about the certainty of these numbers.
When it comes to harm, certainty is high. There is some uncertainty about the exact magnitude of the harm. There is even some uncertainty about the exact nature of the harms, as not every analysis concludes that mortality is increased. But we are certain that the use of thrombolytics results in harm.
On the other hand, the many limitations in the data result in significant uncertainty about the benefit. A pooled estimate from a meta-analysis does not define truth. One has to consider the various factors contributing to the effect being described, and bias is always one of the contributors. As discussed in our article, there are many sources of bias in this literature that result in significant uncertainty about the magnitude of a benefit, and probably even the existence of a benefit.
As we did in the NNT article, I will correct one common meme about thrombolytics: it is inappropriate to compare the rate of intracranial hemorrhage with the rate of neurologic improvement. Long term functional outcomes include those patients who had head bleeds, so if there is a true benefit, it is a benefit that outweighs the harm from head bleeds, and those bleeds actually become somewhat irrelevant.
Bottom line: If the results of these studies were certain, the magnitude of harm versus benefit favours thrombolytics. (Although the increase in mortality still needs to be addressed). But this is not a discussion about magnitudes of harm and benefit. It is a discussion about bias, scientific limitations, and certainty.
Argument 4: “The Alper paper highlights vulnerabilities in the ECASS-III study, without which we have no basis for a 3-4.5h time window”
I won’t rehash the same discussion I had about argument 2. This is another straw man. The Alper paper is interesting, but completely separate from the critical appraisal of ECASS III. (Alper 2020) Instead of discussing flaws in the Alper paper, the important thing is to perform a critical appraisal of the ECASS III trial. Is it a perfect trial that provides absolute certainty about the role of thrombolytics in the 3 to 4.5 hour window? Of course not. No trial can provide absolute certainty, and much like NINDS, ECASS III has a number of important sources of bias to consider. (Hacke 2008) (ECASS III is discussed further here, among many other places.)
As a side note, although critical appraisals have pointed out the many issues with ECASS III long before the Alper paper, there is one detail that I had never seen discussed anywhere prior to Alper: the authors of ECASS III did not follow their prepublished protocol for statistical analysis. Based on their own protocol, it is a statistically negative trial. This is called cheating, or p-hacking, and significantly undermines the credibility of the findings. (Alper 2020)
In all other areas of medicine, we acknowledge that individual RCTs have problems that limit their certainty. The many sources of bias in the ECASS III trial don’t necessarily negate the results, but they need to be considered. They absolutely should decrease your overall certainty.
In place of the ECASS-III, Dr. Drew suggests we look at the Emberson meta-analysis. It is true that the Emberson paper suggests a 5.2% absolute benefit in those patients treated between 3 and 4.5 hours (although this result is biased by only focusing on alteplase). However, pooling data in a meta-analysis does not resolve the bias in the underlying trials. (In fact, pooling data here probably increases bias because the negative trials were stopped early and therefore weighted less heavily.) As I already said in the above section, this is not a question about the magnitude of harms and benefits, but of their certainty.
There are clearly varying interpretations of this literature. Considering the uncertainty in the data and the limitations of the individual trials, that makes sense to me. I think it is very reasonable that Dr. Drew suggests thrombolytics to his patients with acute ischemic stroke. Within the realm of uncertainty, a neurologic benefit is certainly possible. However, I personally think that appropriate critical appraisal of these articles suggests that the statistical benefit seen is probably primarily the result of a combination of bias and chance findings. I think that if we ever get the replication of NINDS that we so desperately need, it is more likely to be negative than positive. Of course, that is only an educated guess.
I hope that I properly characterized Dr. Drew’s thoughts throughout this rebuttal. If not, he has an open invitation to publish clarifications and rebuttals on this site. Again, the goal is not to shape clinical practice. This is not an issue that can be decided by rereading the same papers. We need replication studies to settle the debate. But while we wait for the appropriate science to be performed, our goal should be to ensure that both side’s arguments are being made as strongly as possible, and continue our discussion with the assumption that everyone involved wants the same thing: the best care for our patients.
Ultimately, I don’t actually care that much about the specific question of thrombolytics in acute ischemic stroke. I am not currently working in a stroke center, and even when I was, most patients weren’t eligible. On top of that, stroke makes up a tiny portion of all the emergency patients I see. Furthermore, I think the harms and benefits are likely very close based on the data we have seen so far, so whether you advise for or against thrombolytics, the difference is probably minimal. The reason I have spent so much time on this topic has nothing to do with the clinical management of stroke. The only reason I keep talking about this topic is that it provides us with insights into the workings of evidence based medicine, and the importance of reading papers and thinking critically rather than simply following guidelines.
Edit: Dr. Drew wrote a follow up article with his rebuttals. It can be found here. People should read all aspects of this literature when drawing their own conclusions. There are obviously many opinions on the topic, but for me it remains clear that there is significant uncertainty about the benefit, and that the conclusion of our NNT article will remain correct until we get the appropriate replication studies that are an essential part of all science.
Alper BS, Foster G, Thabane L, Rae-Grant A, Malone-Moses M, Manheimer E. Thrombolysis with alteplase 3-4.5 hours after acute ischaemic stroke: trial reanalysis adjusted for baseline imbalances. BMJ Evid Based Med. 2020 Oct;25(5):168-171. doi: 10.1136/bmjebm-2020-111386. Epub 2020 May 19. PMID: 32430395
Donaldson L, Fitzgerald E, Flower O, Delaney A. Review article: Why is there still a debate regarding the safety and efficacy of intravenous thrombolysis in the management of presumed acute ischaemic stroke? A systematic review and meta-analysis. Emerg Med Australas 2016;28(5):496–510.
Hoffman JR, Schriger DL. A graphic reanalysis of the NINDS Trial. Ann Emerg Med. 2009 Sep;54(3):329-36, 336.e1-35. doi: 10.1016/j.annemergmed.2009.03.019 PMID: 19464756
Lees KR, Bluhmki E, von Kummer R et al. Time to treatment with intravenous alteplase and outcome in stroke: an updated pooled analysis of ECASS, ATLANTIS, NINDS, and EPITHET trials. Lancet. 2010 May 15;375(9727):1695-703. doi: 10.1016/S0140-6736(10)60491-6. PMID: 20472172
Wardlaw JM, Murray V, Berge E, del Zoppo GJ. Thrombolysis for acute ischaemic stroke. Cochrane Database Syst Rev. 2014 Jul 29;2014(7):CD000213. doi: 10.1002/14651858.CD000213.pub3. PMID: 25072528
Wallach JD, Sullivan PG, Trepanowski JF, Sainani KL, Steyerberg EW, Ioannidis JP. Evaluation of Evidence of Statistical Support and Corroboration of Subgroup Claims in Randomized Clinical Trials. JAMA Intern Med. 2017 Apr 1;177(4):554-560. doi: 10.1001/jamainternmed.2016.9125. PMID: 28192563