Brief Review — CheXternal: Generalization of Deep Learning Models for Chest X-ray Interpretation to Photos of Chest X-rays and External Clinical Settings

CheXternal, Investigates Clinically Relevant Distribution Shifts Issue

3 min readAug 22, 2022

--

CheXternal: Generalization of Deep Learning Models for Chest X-ray Interpretation to Photos of Chest X-rays and External Clinical Settings
CheXternal, by Stanford University
2021 CHIL (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Classification, Image Classification

It is found that there is poor generalization due to data distribution shifts in clinical settings is a key barrier to implementation.
8 different chest X-ray models when applied to (1) smartphone photos of chest X-rays and (2) external datasets without any finetuning.
This is a paper from the research group of Andrew Ng.

Outline

CheXternal Setup
Results

1. CheXternal Setup

The diagnostic performance is studied for 8 different chest X-ray models when applied to (1) smartphone photos of chest X-rays and (2) external set (NIH) without any finetuning.

2. Results

2.1. Smartphone Photo of Chest X-Rays

**Matthew’s Correlation Coefficient (MCC) differences of 8 chest X-ray models on different pathologies between photos of the X-rays and the original X-rays with 95% confidence intervals.**

**AUC and MCC performance of models and radiologists on the standard X-rays and the photos of chest X-rays, with 95% confidence intervals**

On photos of chest X-rays, all 8 models experienced a statistically significant drop in task performance.

**MCC performance of models on the photos of chest X-rays, radiologist performance, and their difference, with 95% confidence intervals**

However, only 3 performed significantly worse than radiologists on average.

2.2. External Set (CheXpert Hidden Test Set & NIH)

**Left: MCC differences in performance of models on the** **CheXpert** test set, with 95% confidence intervals (higher than 0 is in favor of the models being better), Right: MCC differences in performance of the same models compared to another set of radiologists across the same pathologies on an external institution’s (**NIH) data**

**Overall change in performance of models (blue) and radiologists (orange) across** **CheXpert** **and the external institution dataset (NIH**)

**MCC performance of models and radiologists on the** **CheXpert** **and** **NIH** **sets of chest X-rays, and their difference, with 95% confidence intervals**

On the external set, none of the models performed statistically significantly worse than radiologists, and five models performed statistically significantly better than radiologists.

The results demonstrate that some chest X-ray models, under clinically relevant distribution shifts, were comparable to radiologists while other models were not.

One of the topics that Prof. Andrew Ng focuses is the data-centric issue in AI. Here, by collaborating with radiologists, the data-centric issue is studied in the field of medical X-ray imaging.

Reference

[2021 CHIL] [CheXternal]
CheXternal: Generalization of Deep Learning Models for Chest X-ray Interpretation to Photos of Chest X-rays and External Clinical Settings

1.8. Biomedical Image Classification

2017 [ChestX-ray8] 2019 [CheXpert] [Rubik’s Cube] 2020 [VGGNet for COVID-19] [Dermatology] [ConVIRT] [Rubik’s Cube+] 2021 [MICLe] [MoCo-CXR] [CheXternal]

Brief Review — CheXternal: Generalization of Deep Learning Models for Chest X-ray Interpretation to Photos of Chest X-rays and External Clinical Settings

CheXternal, Investigates Clinically Relevant Distribution Shifts Issue

Outline

1. CheXternal Setup

2. Results

2.1. Smartphone Photo of Chest X-Rays

2.2. External Set (CheXpert Hidden Test Set & NIH)

Reference

1.8. Biomedical Image Classification

My Other Previous Paper Readings

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sik-Ho Tsang

No responses yet