Brief Review — CheXternal: Generalization of Deep Learning Models for Chest X-ray Interpretation to Photos of Chest X-rays and External Clinical Settings

CheXternal, Investigates Clinically Relevant Distribution Shifts Issue

Sik-Ho Tsang
3 min readAug 22, 2022

CheXternal: Generalization of Deep Learning Models for Chest X-ray Interpretation to Photos of Chest X-rays and External Clinical Settings
CheXternal, by Stanford University
2021 CHIL (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Classification, Image Classification

  • It is found that there is poor generalization due to data distribution shifts in clinical settings is a key barrier to implementation.
  • 8 different chest X-ray models when applied to (1) smartphone photos of chest X-rays and (2) external datasets without any finetuning.
  • This is a paper from the research group of Andrew Ng.

Outline

  1. CheXternal Setup
  2. Results

1. CheXternal Setup

CheXternal Setup
  • The diagnostic performance is studied for 8 different chest X-ray models when applied to (1) smartphone photos of chest X-rays and (2) external set (NIH) without any finetuning.

2. Results

2.1. Smartphone Photo of Chest X-Rays

Matthew’s Correlation Coefficient (MCC) differences of 8 chest X-ray models on different pathologies between photos of the X-rays and the original X-rays with 95% confidence intervals.
AUC and MCC performance of models and radiologists on the standard X-rays and the photos of chest X-rays, with 95% confidence intervals
  • On photos of chest X-rays, all 8 models experienced a statistically significant drop in task performance.
MCC performance of models on the photos of chest X-rays, radiologist performance, and their difference, with 95% confidence intervals
  • However, only 3 performed significantly worse than radiologists on average.

2.2. External Set (CheXpert Hidden Test Set & NIH)

Left: MCC differences in performance of models on the CheXpert test set, with 95% confidence intervals (higher than 0 is in favor of the models being better), Right: MCC differences in performance of the same models compared to another set of radiologists across the same pathologies on an external institution’s (NIH) data
Overall change in performance of models (blue) and radiologists (orange) across CheXpert and the external institution dataset (NIH)
MCC performance of models and radiologists on the CheXpert and NIH sets of chest X-rays, and their difference, with 95% confidence intervals
  • On the external set, none of the models performed statistically significantly worse than radiologists, and five models performed statistically significantly better than radiologists.

The results demonstrate that some chest X-ray models, under clinically relevant distribution shifts, were comparable to radiologists while other models were not.

One of the topics that Prof. Andrew Ng focuses is the data-centric issue in AI. Here, by collaborating with radiologists, the data-centric issue is studied in the field of medical X-ray imaging.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.