Brief Review — CheXtransfer: Performance and Parameter Efficiency of ImageNet Models for Chest X-Ray Interpretation

CheXtransfer, Data-Centric Analysis on Chest X-Ray

  • Deep learning methods for chest X-ray interpretation typically rely on pretrained models developed for ImageNet.
  • In this work, authors compare the transfer performance and parameter efficiency of 16 popular convolutional architectures on a large chest X-ray dataset (CheXpert) to investigate these assumptions.
  • This is a paper from Andrew Ng’s research group.


  1. CheXtransfer Results
  2. Truncated Models Results

1. CheXtransfer Results

1.1. Summary

Visual summary of this paper’s contributions (Error bars show one standard deviation)
  • Leftmost: Scatterplot and best-fit line for 16 pretrained models showing no relationship between ImageNet and CheXpert performance.
  • Second Left: CheXpert performance relationship varies across architecture families much more than within.
  • Second Right: Average CheXpert performance improves with pretraining.
  • Rightmost: Models can maintain performance and improve parameter efficiency through truncation of final blocks.

1.2. Details

Average CheXpert AUC vs. ImageNet Top-1 Accuracy
Average CheXpert AUC vs. Model Size
  • The logarithm of the model size has a near linear relationship with CheXpert performance when no pretraining (Spearman 𝜌 = 0.79).
  • However once with pretraining, the monotonic relationship is weaker (Spearman 𝜌 = 0.56).
Pretraining Boost vs. Model Size

2. Truncated Models Results

Efficiency Trade-Off of Truncated Models. Pretrained models can be truncated without significant decrease in CheXpert AUC
  • Networks compose of repeated blocks, each block is constructed with convolutional layers.
  • Performance is evaluated by truncating the blocks and appending the classification layer (Global average pooling then fully connected layer) at the end.
Comparison of Class Activation Maps Among Truncated Model Family
  • As an additional benefit, architectures that truncate pooling layers will also produce higher-resolution class activation maps by Grad-CAM.
  • The higher-resolution class activation maps (CAMs) may more effectively localize pathologies with little to no decrease in classification performance. In clinical settings, improved explainability through better CAMs may be useful for validating predictions and diagnosing mispredictions.

One of the topics that Prof. Andrew Ng focuses, is the data-centric issue in AI. Here, by collaborating with radiologists, the data-centric issue is studied in the field of medical X-ray imaging.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store