Brief Review — How Transferable are Self-supervised Features in Medical Image Classification Tasks?

DVME, Aggregating SimCLR, SwAV, and DINO Features

  1. Dynamic Visual Meta-Embedding (DVME) is proposed, boosting the performance by aggregating multiple feature information from SimCLR, SwAV, and DINO feature vectors.


  1. Dynamic Visual Meta-Embedding (DVME)
  2. Results

1. Dynamic Visual Meta-Embedding (DVME)

Dynamic Visual Meta-Embedding (DVME)
  • The embedding space is extracted from the last fully connected unit from SimCLR and SwAV with the dimension 2048.
  • For DINO, the embedding is constructed by concatenating the class token of the last four blocks results in the dimension of 1536.
  • Then, each embedding is projected into a fixed size of 512 and fed the concatenation of the resulting embedding into a self-attention module, which is the same as the ViT one, except that the attention is learned across different components of the meta-embedding instead of image patches.
  • The embedding space from attention module is concatenated and projected to a fixed dimension of 512.
  • The importance of each embedding component is learnt for a specific downstream task.

2. Results

2.1. Datasets

Number of samples for different subtasks
  • There are 4 datasets. Each generate small (S), medium (M) and full datasets, for low data regime (S, M) evaluation.

2.2. Individual Self-Supervised Learning Approach

Linear evaluation (Left) and fine-tuning (Right) performance of different self-supervised initializations
  • SwAV and SimCLR pretrained features yield inconsistent patterns across all downstream tasks.
  • DINO initialization consistently outperforms all the other initializations across all tasks by a significant margin.
  • A higher performance for all self-supervised pretrained initializations compared to the supervised pretrained and randomly initialized baselines in the low data regimes.

2.3. Proposed DVME

Linear evaluation performance of Dynamic Visual Meta-Embedding (DVME)
  • The improvement of DVME over the benchmark is particularly pronounced for the APTOS and NIH Chest X-ray tasks. For example,
  • DVME helps gain roughly 6% in Kappa score over the best individual baseline for the S and M subtask of the APTOS dataset.

2.4. t-SNE Visualization

t-SNE visualization of embeddings obtained using different pretrained feature extractors
  • DINO offers a clear class separation compare to its supervised counterpart.


[2021 ML4H] [DVME]
How Transferable are Self-supervised Features in Medical Image Classification Tasks?

1.2. Unsupervised/Self-Supervised Learning

19932021 [MoCo v3] [SimSiam] [DINO] [Exemplar-v1, Exemplar-v2] [MICLe] [Barlow Twins] [MoCo-CXR] [W-MSE] [SimSiam+AL] [BYOL+LP] [DVME] 2022 [BEiT] [BEiT V2]

1.9. Biomedical Image Classification

20172021 [MICLe] [MoCo-CXR] [CheXternal] [CheXtransfer] [Ciga JMEDIA’21] [DVME]

My Other Previous Paper Readings



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store