Brief Review — CheXED: Comparison of a Deep Learning Model to a Clinical Decision Support System for Pneumonia in the Emergency Department

CheXED: Is Deep Learning Model Helpful for Emergency Departments?

Sik-Ho Tsang
3 min readMay 16, 2023

CheXED: Comparison of a Deep Learning Model to a Clinical Decision Support System for Pneumonia in the Emergency Department,
CheXED, by Stanford University, Intermountain Healthcare, University of Utah, and Intermountain Medical Center,
2022 J. Thorac Imaging (Sik-Ho Tsang @ Medium)

Biomedical Image Classification
2017 … 2021 [CheXternal] [CheXtransfer] [CheXbreak] 2022 [BUS-CNN]

  • The purpose of this study is to investigate whether a deep learning model for detecting radiographic pneumonia and pleural effusions can improve functionality of a clinical decision support system (CDSS) for pneumonia management (ePNa) operating in 20 Emergency Departments (EDs).


  1. CheXED
  2. Results

1. CheXED

1.1. Dataset

Dataset Summary
  • The chest radiographic studies used in this study were originally collected between December 2009 and September 2015 from 7 EDs as part of development and validation of ePNa.
  • This cohort includes adult (at least 18 y old) patients who were either suspected of pneumonia or given a diagnosis of pneumonia.

The combined dataset contained 7434 studies with frontal-view and lateral-view chest images from 6551 adult patients.

1.2. Model Architecture

  • Goal: The network was trained to classify a chest radiographic study as (1) negative, uncertain, and positive for radiographic pneumonia; (2) unilobar or multilobar for the possible pneumonia studies; and (3) negative or positive for pleural effusion.
  • Model: The network, which used a 121-layer Densely Connected Convolutional Network (DenseNet) architecture.
  • Pretraining: It was first pretrained to classify the absence or presence of 14 observations (including pneumonia and pleural effusion) on the CheXpert dataset, containing >200,000 radiographs from Stanford Medical Center patients.
  • Fine-Tuning: The learned weights were then used to initialize the network which was fine-tuned to detect the three radiographic findings on the training set. To generate a prediction for a new study, CheXED was run on all available views (frontal and lateral) in the study and the maximum probability for each finding was taken as the predicted output for the whole study.
  • Visualization: Once trained, CheXED predictions were interpreted through the use of Class Activation Maps (CAMs), which produce a heat map.

2. Results

ROC Curves

With pretraining on CheXpert, on the test set, CheXED achieved an AUC of 0.939 (95% CI: 0.911, 0.962) on pleural effusion, an AUC of 0.833 (95% CI: 0.795, 0.868) on radiographic pneumonia, and an AUC of 0.847 (95% CI: 0.800, 0.890) on discerning between unilobar and multilobar pneumonia.

  • Without pretraining on CheXpert, the test set AUC scores of the model were 0.769 (95% CI: 0.724, 0.811) on detecting radiographic pneumonia, 0.926 (95% CI: 0.896, 0.952) on detecting pleural effusion, and 0.778 (95% CI: 0.724, 0.830) on discerning between unilobar and multilobar pneumonia, which are lower compared with those with pretraining.
Agreement Between CehXED, ePNa, and Physician Labeling

On pleural effusion, CheXED achieved significantly higher agreement (0.66; 95% CI: 0.59, 0.74) with the adjudicated radiologist reference standard on the test set than did the physician labeling of the report (0.53; 95% CI: 0.43, 0.63) and ePNa (0.51; 95% CI: 0.41, 0.61).

On radiographic pneumonia, model agreement with the reference standard (0.41; 95% CI: 0.35, 0.48) was significantly higher than both the physician labeling of the report (0.34; 95% CI: 0.27, 0.41) and ePNa (0.19; 95% CI: 0.12, 0.26).

On differentiating between unilobar and multilobar pneumonia, the model agreement with the reference standard (0.38; 95% CI: 0.32, 0.45) was significantly lower than the physician labeling (0.49; 95% CI: 0.43, 0.56) but significantly higher than ePNa.

Visualizations Using CAMs

Representative examples of disagreements between the radiologists along with model classifications on the test set are shown as above.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.