Morgenstern, J. Gestalt is better than decision tools for identifying sepsis, First10EM, July 29, 2024. Available at:
https://doi.org/10.51684/FIRS.137103
Gestalt for sepsis? This paper hits two of my biggest pet peeves in medicine. 1) The endless emphasis on rushing to hit treatment targets in sepsis or otherwise and 2) the assumption that decision instruments must be better than basic clinical judgment. I am aware, therefore, that my interpretation is more likely to be biased, which is why I always suggest reading these papers for yourself. (You will be able to hear more about this paper when I discuss it with Ken Milne on the SGEM, and of course with Casey Parker on the BroomeDocs podcast.)
The paper
Knack SKS, Scott N, Driver BE, Prekker ME, Black LP, Hopson C, Maruggi E, Kaus O, Tordsen W, Puskarich MA. Early Physician Gestalt Versus Usual Screening Tools for the Prediction of Sepsis in Critically Ill Emergency Patients. Ann Emerg Med. 2024 Mar 25:S0196-0644(24)00099-4. doi: 10.1016/j.annemergmed.2024.02.009. Epub ahead of print. PMID: 38530675
The Methods
This is a single center prospective observational trial.
Patients
Critically ill, adult (18 and older), undifferentiated medical patients presenting to a specialized 4 bed resuscitation area an academic emergency department in the United States.
They excluded patients with trauma, and obvious causes of illness, defined as cardiac arrest, STEMI, suspected stroke, and patients in active labour. They also excluded patients being transferred from outside facilities.
Intervention
Faculty emergency physicians were asked “what is the likelihood that this patient has sepsis?”, and asked to rate the likelihood on a scale from 0 to 100. They were asked 15 and 60 minutes after the patient’s presentation. In order to calculate statistics, they decided anything above 50% was consistent with the diagnosis of sepsis, but I am not sure that is a good assumption, which I will discuss below.
Comparison
Screening tools for sepsis, including SIRS, qSOFA, SOFA, and MEWS were calculated retrospectively from data found on the chart. They also constructed a machine learning model – LASSO or Least Absolute Shrinkage and Selection Operator – which trained on 80% of the available data and was only tested on 20%.
Outcome
Their primary outcome was accuracy of sepsis prediction compared to the final diagnosis of sepsis, based on ICD 10 codes at discharge.
The Results
They included 2484 patients, with a median age of 53, and 60% being male. 257 (11%) were ultimately diagnosed with sepsis.
94% of the physician judgment was completed by staff physicians, with only 6% by residents. They were missing a lot of data for the other screening tools. At 15 minutes, although 100% of patients had enough information for a qSOFA, only 59% could have a MEWS calculated, 7% for SIRS, and 2% for the full SOFA score. The numbers remained similarly low by 1 hour.
The median visual analog scale (VAS) score in patients with sepsis was 81, as compared to 8 in those without sepsis.
Physician judgment was better than all the decision tools, both at 15 and 60 minutes. Physician judgment had an area under the curve (AUC) of 0.90 (95% CI 0.99-0.92), beating all of LASSO (0.84; 95% CI 0.82 to 0.87), qSOFA (0.67; 95% CI 0.64 to 0.71), SIRS (0.67; 95% CI 0.64 to 0.70), SOFA (0.67; 95% CI 0.63 to 0.70), and MEWS (0.66; 95% CI 0.64 to 0.69).
My thoughts
This is an important study. We need studies that compare decision instruments to clinical judgment. (Really, those studies should take place before we even consider using decision tools, but unfortunately we haven’t learned that lesson yet in medicine.) This study is probably especially important in places that have silly rules mandating specific actions be taken within specific timeframes for patients diagnosed with sepsis. All in all, I believe these results, and I think you should quote them to stop the use of stupid decision tools in medicine. That being said, this study has some significant limitations that should substantially decrease your certainty in the results.
Perhaps the biggest issue is the primary outcome. What is the true definition of sepsis? Do we have a gold standard? In this study, the (fool’s) gold standard was the chart containing an ICD 10 code of sepsis at the time of discharge. But how many of these patients truly had sepsis? Discharge diagnosis is a poor gold standard, because it is possible that patients could have developed sepsis later in their hospital stay. Imagine a patient with intestinal ischemia as the cause of their initial presentation, but without any signs of infection on day one. We might expect that patient to develop on an infection 2-3 days into their hospital course, but that doesn’t mean that they had sepsis on presentation. In fact, if we simply labeled them sepsis, that could result in a catastrophic mistake, if that meant we missed the intestinal ischemia. Furthermore, not all sepsis is created equal. I might care a lot about identifying septic shock or severe sepsis, but if these patients fell out of those more severe categories, do I even care? I don’t think these definitional problems are the researchers’ fault, nor are there obvious easy solutions, but they should significantly limit any conclusions we make from this data.
My guess is that gestalt is even better than the numbers they present. In table 3, they provide some details about the 10 patients who clinicians labeled as low likelihood of sepsis who were ultimately labeled as having sepsis on discharge. We don’t have enough details to say for sure, but it sounds to me like the majority of these patients might have developed sepsis later in their hospital stay. A patient with afebrile respiratory distress intubated and admitted to the ICU who later gets a diagnosis of sepsis didn’t necessarily have sepsis in the ED. Nosocomial infections happen. Aspiration pneumonia during intubation happens. The problem is the gold standard used.
It is also worth noting that every single “miss” actually received antibiotics in the emergency department, which really makes me question the definition of a miss (or perhaps our threshold for empiric antibiotics).
That leads to another key question about the way this trial was designed: how does a visual analogue scale translate to clinical care? They asked physicians to rate the chances of sepsis from 0 to 100. That is a reasonable question for research purposes, but it is entirely unclear what those numbers mean for clinical care. If a patient has a 60% chance of sepsis, do you empirically treat it as sepsis, or wait for more information? 40%? 20%? 10%? It is likely that different clinicians will act at different thresholds. For their stats, they decided that anything above 50% meant the patient had sepsis, but they didn’t ask the clinicians for their interpretation. Indeed, we know that the clinicians were acting at a much lower threshold, considering that every single “miss’ was given antibiotics in the ED. Therefore, although this study asks a theoretically interesting question, a much more important question is how gestalt compares to decision rules in terms of clinical actions. In other words, the question we really wanted answered is “based on your gestalt, are you going to empirically treat this patient as if they have sepsis?”, regardless of the specific VAS score.
I think those are the most important issues to grapple with if you want to understand this study, but there are a number of other critical appraisal points that could influence the results of this study. You need to consider the possibility of the Hawthorne effect. These clinicians were specifically being asked about sepsis. The simple act of asking might influence their estimates. For example, you might have left the room of a patient in shock, with a working diagnosis of PE, but when asked about sepsis, realize that it should be on the differential, upgrade your judgment when answering the question, and then subsequently add antibiotics. If the research assistants weren’t present, it is possible that clinicians could have missed more cases. (The decision tools would not suffer from this same bias).
Although a number of decision tools have been proposed for sepsis screening, none really diagnose or define sepsis. They mention it in their limitations, but the comparison to SIRS or SOFA is somewhat nonsensical, because those scores don’t actually diagnose sepsis. A diagnosis of sepsis requires a positive SIRS or SOFA score plus a clinical diagnosis of infection. In other words, the definition of sepsis always relies on clinical judgment (no matter what bean counters looking at data retrospectively want to say).
There are also questions of generalizability. This study only looked at critically ill patients. As I think this study demonstrates, we are very good at identifying and treating sepsis in patients who look like they need the ICU when they arrive in the ED. The more difficult group of patients are those who present atypically, but get much sicker in the 24 hours after their initial examination. It is possible, but obviously completely unproven, that objective tools or AI could help identify risk factors clinicians are overlooking in these harder to diagnose patients.
This study also focused specifically on staff clinicians at a highly functional academic emergency department, with rapid availability of laboratory results, and early (presumably expert) ultrasound use. The results may vary in other clinical settings.
I rarely spend much time reading the introduction section of a paper, but there are times when the introduction might be as important as the methods. All studies must make assumptions. These assumptions can be minor, and they are often obvious, but sometimes the assumptions are hidden and incredibly consequential. For example, a well designed trial of acupuncture contains the hidden assumption that there exists such a thing as meridians that can be balanced through specific acupuncture points. It doesn’t matter how well the trial is designed, nor what the results are, when you base your trial on faulty assumptions.
This trial seems to be based on the assumption that early identification of sepsis is inherently good, because early treatment will result in better outcomes. That is an assumption that has mostly been shown to be false. Early treatment might matter in septic shock, although even there we are relying on shaky associations, but I think the best data we have shows no association between time to antibiotics and outcomes in patients without septic shock. If that is the case, one must question the premise of this study.
In this study, they measure clinician gestalt at both 15 and 60 minutes. Those artificial cut-offs, presumably used because of silly American administrative rules, are clinically meaningless. Most patients with severe sepsis are obvious on the initial assessment, and get immediate treatment. For the less obvious cases, does it really matter what my guess is at 60 minutes, rather than at 120 minutes when I have the results of lab work and imaging? What probably matters is the diagnosis at the time of disposition, rather than at any specific (and artificial) time cut-off. That being said, this study is taking place in a magic emergency department, where labs are being reported to emergency physicians within 15 minutes of patient arrival, so they might have more information in the first 15 minutes than I have at 3 hours.
It is always important to understand how new research fits with previous findings. These results don’t surprise me at all. Almost none of our decision rules have been shown to be better than clinical judgment, and the test characteristics for all of these ‘sepsis decision instruments’ are worse than most other tools we use. For that reason, despite the many limitations, I believe the results of this trial.
Bottom line
These decision instruments for sepsis have never been shown to improve upon clinical judgment or improve patient care. Until we have evidence of benefit, they shouldn’t be used, and we should rely on clinical judgment, which this trial demonstrates is quite good.
Other FOAMed
Clinical decision rules are ruining medicine
Evidence based medicine is easy
Evidence based medicine resources
References
Knack SKS, Scott N, Driver BE, Prekker ME, Black LP, Hopson C, Maruggi E, Kaus O, Tordsen W, Puskarich MA. Early Physician Gestalt Versus Usual Screening Tools for the Prediction of Sepsis in Critically Ill Emergency Patients. Ann Emerg Med. 2024 Mar 25:S0196-0644(24)00099-4. doi: 10.1016/j.annemergmed.2024.02.009. Epub ahead of print. PMID: 38530675

