Emergency medicine loves decision rules. I can understand why, considering the apparent certainty they provide in a job that is anything but certain. However, decision tools are tests like any other, and can cause harm if they lead patients down inappropriate pathways. Although rare, the ideal test of a decision tool is an RCT of its implementation, with a focus on changing patient important outcomes. However, before even considering an RCT, it is worth asking: is this decision rule more accurate than current practice (aka clinical judgement or gestalt)?
Babl FE, Oakley E, Dalziel SR, et al. Accuracy of Clinician Practice Compared With Three Head Injury Decision Rules in Children: A Prospective Cohort Study. Annals of emergency medicine. 2018; 71(6):703-710. PMID: 29452747
This is a planned secondary analysis of a prospective observational cohort.
Children younger than 18 years with a head injury presenting to one of 10 pediatric emergency departments in Australia and New Zealand.
- Excluded patients with a GCS <13, or who presented more than 24 hours after the injury.
Clinician judgement (based on whether or not a CT was actually ordered during the first ED visit).
3 decision tools – PECARN, CATCH, and CHALICE
Clinically important traumatic brain injury (death from traumatic brain injury, need for neurosurgery, intubation >24 hours for traumatic brain injury, and hospital admission >2 nights for traumatic brain injury in association with traumatic brain injury on CT).
They included 18,913 patients (out of 29,433 screened). 1579 (8.3%) had a CT scan on the initial ED visit (and another 112 had a CT scan at some point during the study period).
24 patients (0.1%) underwent neurosurgery. 160 (0.9%) fit their criteria for aclinically important traumatic brain injury.
Clinicians ordered a CT in 158 of the 160 children who had a clinically important traumatic brain injury (clinical judgement sensitivity 98.8%, 95% CI 95.6-99.9%). Neither of the 2 misses required neurosurgery, nor had any bad outcome. Clinician specificity was 92.4% (95% CI 92-92.8%). Most importantly, if you include the patients whom the clinician decided to observe for 4 hours, they had a 100% sensitivity.
PECARN had an equal sensitivity, but much worse specificity.
CATCH and CHALICE were clearly worse (although the low number of events means that the 95% confidence intervals overlap). CATCH missed 1 patient requiring neurosurgery and CHALICE missed 2.
Decision tools rarely beat clinical judgement. That doesn’t mean that they are useless, but we have to be very careful with how they are used.
This is a good study with a believable result, but it is a secondary analysis, even if planned. There are a few key shortcomings that should be noted.
The primary outcome is a composite outcome that clearly combines things of unequal value. Death is nothing like a 2 day admission. Furthermore, an admission for 2 days is subjective, and could be influenced by a large number of factors that are completely unrelated to the severity of the injury. We are not given a very good breakdown of the actual outcomes in this manuscript.
I really dislike combining GCS 13 and 15 patients into the same group. A lot of head injury studies do this, but a GCS 13 patient is very different than the majority of patient we see, who are GCS 15 on arrival. The GCS 13 and 14 patients also make up such a small proportion of the patients in this study (less than 4%) that it is very difficult to make any conclusions in this subgroup. This manuscript does not indicate what percentage of the major injuries were in the GCS 13-14 group. That doesn’t mean that GCS 13 patients should all be scanned, but I do think they deserve to be studied separately.
They defined clinician judgement based on whether a CT was ordered on the first visit. This might not be a strict representation of clinicians’ judgement, as there are various forces that can pressure clinicians into ordering tests they don’t strictly think are necessary. However, this probably represents a reasonable real world assessment of clinical practice.
The Hawthorne effect is possible here, as physicians knew they were being studied, so accuracy may be lower in real life. Furthermore, unlike usual clinical practice, the physicians here filled out a study form before deciding whether to order a CT. Although that form did not include the specific decision tools, it did contain all the information that those tools require. Therefore, these clinicians might have had a heightened awareness of high risk features. That being said, the high risk features used in any of these rules are fairly simple and can usually be recited to me by any medical student, even those unfamiliar with the tools in question, so I doubt this had a huge impact on clinical care.
It is also possible that the physicians were actually using the decision tools, as they were well known by the time the study was conducted. However, the differences in accuracy suggest that if they were using the rules, they were incorporating clinical judgement. (These rules are one directional, meaning they should be used to support the decision not to CT scan, but should not push doctors to order a CT believed to be unnecessary. If clinicians were using the rules this way, it would explain the higher specificity).
Most importantly, this is an incredibly low risk group to start with. (Although, I think it is probably representative of pediatric head injury seen at most emergency departments). Given that only 0.1% of patients had neurosurgery (the only outcome I am really looking for with a CT), if you had a decision tool that told you not to scan a single patient, you would have ended up with a 100% specificity and a negative predictive value of 99.9%. This explains why I have only ordered 2 pediatric head CTs (for trauma) in 8 years at a busy community hospital with a lot of pediatrics. (My specificity is only 50%, but as far as I know, my sensitivity is 100%). There were only 160 clinically important injuries in 3.5 years at 10 pediatric EDs, 7 of which were trauma centers. Although we don’t want to miss a single one of these injuries, the base rate means we are much more likely to cause harm through overtesting than because of a miss.
Of course, this study was run in Australia and New Zealand, so when trying to extrapolate the results, it is important to consider the possibility that doctors are just smarter in the southern hemisphere.
Clinical judgement is probably better than clinical decision tools when deciding on imaging in pediatric head trauma (among physicians who know the decision tools). I make my decisions based on clinical judgement, and not one particular rule. However, like the physicians in this trial, I know all the risk factors in the decision tools. These tools, and the studies they are developed from, are important resources to help develop clinical judgement. And I still occasionally formally use the PECARN tool, but only in patients I am concerned about, and only as a one way tool to convince me not to scan a patient I would otherwise send to CT.
Justin Morgenstern. Clinical judgement in pediatric head injury (Babl 2018), First10EM, 2019. Available at: