Review — Learning to Segment Images with Classification Labels

Use Image-Level Labels to Help Biomedical Image Segmentation

  • An architecture is proposed that can alleviate the requirements for segmentation- level ground truth by making use of image-level labels to reduce the amount of time spent on data curation.


  1. Architecture
  2. Training Strategy
  3. Datasets
  4. Experimental Results

1. Architecture

Proposed architecture based on ResNet-18
  • Blue arrows indicate residual connections of ResNet-18, the red squares are the max pooling operations, blue square is the unpooling (spatial up-sampling).
  • After each convolutional layer (block) of the ResNet architecture, ReLU activation and batch-normalization is applied.

2. Training Strategy

Overview of the training procedure for each batch
  • Two alternating steps are performed using pixel-level images (with seg- mentation masks) and image-level labels (images with only classification labels) in the batch:
  • Step 1: Pixel level images are used to train the network with input images and segmentation masks with the standard backpropagation algorithm on the segmentation network (encoder+decoder) without passing through the classification layer.
  • Step 2: The data with only image level labels are passed through the segmentation network to obtain a segmentation mask output, which is then transformed by the classification layer to obtain the classification output vectorR^C, where C is the number of classes.
  • This vector is used for backpropagation with cross entropy loss as an error signal to update segmentation network weights to correct the segmentation mask for the given image.
  • The network is simultaneously optimized by classification loss Lcls and segmentation loss Lseg:

3. Datasets & Metrics

Sample images from each dataset

3.1. ICIAR BACH 2018

  • Breast cancer histology BreAst Cancer Histology images (BACH) dataset.
  • The challenge is split into two parts, A and B.
  • For part A, the aim is to classify each microscopy image into four classes: normal tissue, benign tissue, ductal carcinoma in situ (DCIS), and invasive carcinoma.
  • For part B, the task is to predict the pixelwise labeling of WSI into same four classes, i.e., the segmentation of the WSI.
  • The dataset consists of 400 training and 100 test microscopy images, where each image has a single label, and 20 labeled WSIs with segmentation masks (split into 10 training and 10 testing images).

3.2. Gleason2019

  • Grading of prostate cancer Gleason score, ranging from 1 (healthy) to 5 (abnormal).
  • Gleason2019 challenge consists of 244 tissue micro-array (TMA) images and their corresponding pixel-level annotations detailing the Gleason grade of each region on an image.

3.3. DigestPath2019

  • The dataset consists of 660 image patches with binary pixel-level masks (benign and malignant) from 324 WSIs scanned at 20× resolution. The average size of each image patch is of 5000×5000 pixels, which are resized to 1024 ×1024 for the experiments in this paper.

3.4. Metrics

  • Two variants of F1 scores, called the macro and micro F1, are used.
  • Both metrics are calculated class-wise, and the macro weighs each class-wise score equally whereas the micro considers the class imbalance, weighing the scores per ground truth ratios on the WSI.

3.5. Settings

  • For ICIAR BACH 2018, 3 WSIs from part B and whole part A are used for training.
  • For Gleason2019 and DigestPath2019, 50% of the dataset is used as training set, 25% as validation, and the remaining 25% as the test set.
Tile extraction for DigestPath2019 and Gleason2019
  • Each segmentation mask is split into tiles of size 128×128 pixels.
  • If a dominant class is covering ≥90% of the tile, then it is considered as a classification patch (dashed green boxes).
  • A tile that only contains two classes is considered as a segmentation patch (solid green boxes).
  • And any other tile is ignored.
  • For s=0%, only classification patches are used, hence for the S setting, the network predicts random outputs. For S+C and S+C∗ settings, s=0% reduces to a classifier which predicts one value per patch.

4. Experimental Results

4.1. Classification

  • Though classification accuracy is not the focus, they also perform the classification experiments.
Classification results using the classification head on 50% of the classification patches.
Accuracy results for the classification task for the three datasets (These results are normalized to c=50%)
  • S2+C2 (Blue): 100–2c% of segmentation patches and c% of classification patches are used.
  • S2+C∗2 (Purple): 100–2c% of segmentation patches, and 50% of classification patches are used.
  • S∗2+C2 (Black): 100% of segmentation patches, and the number of classification patches is varied from 0 to 50%, are used.
  • 50% of classification patches and any addition of segmentation patches decrease the performance.

4.2. Segmentation

Comparison of training performance between using only segmentation (S) patches, both segmentation and classification (S+C) images, and varying the amount of segmentation patches while using the complete set of classification patches (S+C∗) (These results are normalized to s=100%)
  • When ≤10% of segmentation patches is used, there is a significant performance gap (≥15% for both F1 metrics) between the proposed method (S+C or S+C∗) and the S setting.

4.3. SOTA Comparison

  • For SOTA approaches, Ciga et al. (2019) achieve a challenge-specific score of 68% on the ICIAR BACH 2018, whereas the proposed method obtains a score of 54%.
  • Li et al. (2020) achieve 67.9% Dice (F1) score with a U-Net on DigestPath2019, whereas the proposed method only achieves 42%.
  • Finally, Zhang et al. (2020) obtain a 75% Dice to proposed one 41%.


[2021 JMIA] [Ciga JMEDIA’21]
Learning to Segment Images with Classification Labels

1.9. Biomedical Image Classification

20172021 [MICLe] [MoCo-CXR] [CheXternal] [CheXtransfer] [Ciga JMEDIA’21]

1.10. Biomedical Image Segmentation

2015 … 2020 [MultiResUNet] [UNet 3+] [Dense-Gated U-Net (DGNet)] [Rubik’s Cube+] 2021 [Ciga JMEDIA’21]

My Other Previous Paper Readings



PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store