Review — Big Self-Supervised Models Advance Medical Image Classification

MICLe, Using Multiple Images From Same Patient for Self-Supervised Learning

Sik-Ho Tsang
6 min readJul 19, 2022

Big Self-Supervised Models Advance Medical Image Classification
MICLe, by Google Research and Health
2021 ICCV, Over 90 Citations (

@ Medium)
Self-Supervised Learning, Image Classification, Medical Image Classification

  • Self-supervised learning on ImageNet, followed by additional
    self-supervised learning on unlabeled domain-specific medical
    significantly improves the accuracy of medical
    image classifiers.
  • A novel Multi-Instance Contrastive Learning (MICLe) method is proposed to use multiple images of the underlying pathology per patient case to construct more informative positive pairs for self-supervised learning.


  1. Motivations
  2. Proposed Pretraining Procedure
  3. Multi-Instance Contrastive Learning (MICLe)
  4. Experimental Results

1. Motivations

  • Learning from limited labeled data is a fundamental problem in machine learning, which is crucial for medical image analysis.
  • Two common pretraining approaches to learning from limited labeled data include: (1) supervised pretraining on a large labeled dataset such as ImageNet, (2) self-supervised pretraining using contrastive learning on unlabeled data.
  • Two distinct medical image classification tasks: (1) Dermatology skin condition classification from digital camera images, (2) multi-label chest X-ray classification among five pathologies based on the CheXpert dataset.
Self-supervised learning utilizes unlabeled domain-specific medical images and significantly outperforms supervised ImageNet pretraining (Image from Google AI Blog)
  • It is observed that self-supervised pretraining outperforms supervised pretraining. This is attribute to the domain shift and discrepancy between the nature of recognition tasks in ImageNet and medical image classification.

Self-supervised approaches bridge this domain gap by leveraging in-domain medical data for pretraining.

2. Proposed Pretraining Procedure

Proposed Pretraining Approach

2.1. Overall Procedure

  • The proposed approach comprises three steps:
  1. Self-supervised pretraining on unlabeled ImageNet using SimCLR.
  2. Additional self-supervised pretraining using unlabeled medical images. If multiple images of each medical condition are available, a novel Multi-Instance Contrastive Learning (MICLe) is used to construct more informative positive pairs based on different images.
  3. Supervised fine-tuning on labeled medical images.

2.2. SimCLR

SimCLR for CheXpert and Dermatology Datasets
  • SimCLR is used where it learns representations by maximizing agreement between differently augmented views of the same data example via a contrastive loss in a hidden representation of neural nets.
  • It is used when only single image of a medical condition is available, a standard data augmentation is used to generate two augmented views of the same image.
  • (Please feel free to read SimCLR if interested.)

3. Multi-Instance Contrastive Learning (MICLe)

Multi-Instance Contrastive Learning (MICLe)
  • In medical image analysis, it is common to utilize multiple images per patient to improve classification accuracy and robustness.
  • When multiple images are available, two distinct images are used to directly create a positive pair of examples.

Such images may be taken from different viewpoints or under different lighting conditions, providing complementary information for medical diagnosis.

  • The learnt representations are invariant not only to different augmentations of the same image, but also to different images of the same medical pathology.
MICLe (Image from Google AI Blog)
  • MICLe does not require class label information and only relies on different images of an underlying pathology, the type of which may be unknown.

4. Experimental Results

4.1. Pretraining Approach Without MICLe

Performance of Dermatology skin condition and Chest X-ray classification model measured by top-1 accuracy (%) and area under the curve (AUC) across different architectures
  • Three possible scenarios are considered for self-supervised pretraining in the medical context:
  1. using ImageNet dataset only,
  2. using the task specific unlabeled medical dataset (i.e. Derm and CheXpert), and
  3. initializing the pretraining from ImageNet self-supervised model but using task specific unlabeled dataset for pretraining, here indicated as ImageNet→Derm and ImageNet→CheXpert.

The best performance are achieved when both ImageNet and task specific unlabeled data are used.

  • Larger models are able to benefit much more.

4.2. SOTA Comparison on Derm

Evaluation of multi instance contrastive learning (MICLe) on Dermatology condition classification

MICLe consistently improves the accuracy of skin condition classification over SimCLR on different datasets and architectures.

Comparison of best self-supervised models vs. supervised pretraining baselines on Dermatology classification
  • The performance is further improved by providing more negative examples with training longer for 1000 epochs and a larger batch size of 1024.

The proposed self-supervised model significantly outperforms the supervised baseline when ImageNet pretraining is used.

4.3. SOTA Comparison on CheXpert

Comparison of best self-supervised models vs. supervised pretraining baselines on chest X-ray classification
  • No multiple images per patient in this dataset. MICLe cannot be used.

With additional in-domain unlabeled data (only use the CheXpert dataset for pretraining), self-supervised pretraining can surpass the BiT baseline by a larger margin.

4.4. Domain Shift

Evaluation of models on distribution-shifted datasets
  • For Dermatology dataset, an additional out-of-domain (OOD) dataset is used, which is primarily focused on skin cancers and the ground truth labels are obtained from biopsies.
  • For CheXpert, the NIH (National Institutes of Health) chest X-ray dataset, i.e. ChestX-ray8, is used as OOD dataset, which consist of 112,120 de-identified X-rays from 30,805 unique patients.
  • The model, which is post pretraining and end-to-end fine-tuning, (i.e. CheXpert and Derm) is used to make predictions on an additional shifted dataset without any further fine-tuning (zero-shot transfer learning).
  • When only using ImageNet for self-supervised pretraining, the model performs worse in this setting compared to using in-domain data for pretraining.

The results generally suggest that self-supervised pretrained models can generalize better to distribution shifts.

  • The performance improvement in the distribution-shifted dataset due to self-supervised pretraining (both using ImageNet and CheXpert data) is more pronounced than the original improvement on the CheXpert dataset.

This is a very valuable finding, as generalisation under distribution shift is of paramount importance to clinical applications.

4.5. Label Efficiency

Top-1 accuracy for Dermatology condition classification for MICLe, SimCLR, and supervised models under different unlabeled pretraining dataset and varied sizes of label fractions
  • The label fractions ranging from 10% to 90% for both Derm and CheXpert training datasets.
  • Pretraining using self-supervised models can significantly help with label efficiency for medical image classification.

MICLe yields proportionally larger gains when fine-tuning with fewer labeled examples.

  • MICLe is able to match baselines using only 20% of the training data for ResNet-50 (4×) and 30% of the training data for ResNet-152 (2×).



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.