Review — ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases

NIH Chest X-ray Dataset

Sik-Ho Tsang
5 min readJul 27, 2022

ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases, ChestX-ray8, by National Institutes of Health
2017 CVPR, Over 2400 Citations (Sik-Ho Tsang @ Medium)
Medical Image Classification, Image Classification, Dataset, NIH

  • A new chest X-ray database, namely “ChestX-ray8”, which comprises 108,948 frontal-view X-ray images of 32,717 unique patients.
  • The commonly occurring thoracic diseases can be detected and even spatially-located via a unified weakly-supervised multi-label image classification and disease localization framework.


  1. ChestX-ray8 Dataset
  2. Unified DCNN Framework
  3. Experimental Results

1. ChestX-ray8 Dataset

1.1. Images

Eight common thoracic diseases observed in chest X-rays that validate a challenging task of fully-automated diagnosis
  • “ChestX-ray8” dataset is generated, which comprises 108,948 frontal-view X-ray images of 32,717 (collected from the year of 1992 to 2015) unique patients with the text-mined eight common disease labels, mined from the text radiological reports via NLP techniques.
  • The typical X-ray image dimensions of 3000×2000. X-rays images are resized as 1024×1024 bitmap images without significantly losing the detail contents.

1.2. Classification Labels

The circular diagram shows the proportions of images with multi-labels in each of 8 pathology classes and the labels’ co-occurrence statistics
  • 8 Labels: Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, and Pneumathorax.
  • All-zero vector [0, 0, 0, 0, 0, 0, 0, 0] represents the status of “Normal”.
  • ‘1’ indicates the presence of that corresponding label.
  • Processing (NLP) techniques are adopted for detecting the pathology keywords and removal of negation and uncertainty. Each radiological report will be either linked with one or more keywords or marked with ’Normal’ as the background category.
  • The above figure reveals some connections between different pathologies, which agree with radiologists’ domain knowledge, e.g., Infiltration is often associated with Atelectasis and Effusion.

1.3. Localization Labels

  • A small number of images with pathology are provided with hand labeled bounding boxes (B-Boxes), which can be used as the ground truth to evaluate the disease localization performance.

2. Unified DCNN Framework

The overall flow-chart of our unified DCNN framework and disease localization process

2.1. Network Architecture

  • The goal is to first detect if one or multiple pathologies are presented in each X-ray image and later they are located using the activation and weights extracted from the network.
  • The ImageNet-pretrained network are used, e.g., AlexNet, GoogLeNet, VGGNet-16 and ResNet-50, by leaving out the fully-connected layers and the final classification layers.
  • Instead, a transition layer, a global pooling layer, a prediction layer and a loss layer are inserted in the end (after the last convolutional layer).
  • The transition layer has a uniform dimension of output, S×S×D, S∈{8, 16, 32}. D=1024 for GoogLeNet and D=2048 for ResNet.
  • A combination of deep activations from transition layer (a set of spatial image features) and the weights of prediction inner-product layer (trained feature weighting) can enable to find the plausible spatial locations of diseases.
  • By performing a global pooling after the transition layer, the weights learned in the prediction layer can function as the weights of spatial maps from the transition layer. Therefore, weighted spatial activation maps can be produced for each disease class.
  • Besides the conventional max pooling and average pooling, the Log-Sum-Exp (LSE) pooling in [31] is utilized:
  • where xij is the activation value at (i, j), and r is the hyperparameter.
  • Since the LSE function suffers from overflow/underflow problems, a modified LSE function is proposed:

2.2. Loss

  • Multi-label Classification Loss Layer: 3 standard loss functions are tried instead of using the softmax loss for traditional multi-class classification model, i.e., Hinge Loss (HL), Euclidean Loss (EL) and Cross Entropy Loss (CEL).
  • The image labels are rather sparse, meaning there are extensively more ‘0’s than ‘1’s.
  • The weighted CEL (W-CEL) is proposed:
  • where
  • |P| and |N| are the total number of ‘1’s and ‘0’s in a batch of image labels.

3. Experimental Results

  • In total, 108,948 frontal-view X-ray images are in the database, of which 24,636 images contain one or more pathologies. The remaining 84,312 images are normal cases.
  • Training (70%), validation (10%) and testing (20%).
  • 983 images are with 1,600 annotated B-Boxes of pathologies.
A comparison of multi-label classification performance with different model initializations
AUCs of ROC curves for multi-label classification in different DCNN model setting
  • The model based on ResNet-50 achieves the best results.
  • The “Cardiomegaly” (AUC=0.8141) and “Pneumothorax” (AUC=0.7891) classes are consistently well-recognized compared to other groups while the detection ratios can be relatively lower for pathologies which contain small objects, e.g., “Mass” (AUC=0.5609) and “Nodule” classes.
Pathology localization accuracy and average false positive number for 8 disease classes
  • The Intersection over the detected B-Box area ratio (IoBB) is evaluated.
  • Due to the relatively low spatial resolution of heatmaps (32×32) in contrast to the original image dimensions (1024×1024), the computed B-Boxes are often larger than the according GT B-Boxes.
  • Therefore, a correct localization is defined by requiring IoBB > T(IoBB).

This dataset is widely used, e.g.: MICLe also uses this dataset as out-of-domain (OOD) dataset for testing.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.