Brief Review — How Transferable are Self-supervised Features in Medical Image Classification Tasks?
How Transferable are Self-supervised Features in Medical Image Classification Tasks?,
DVME, by Bayer AG, Germany
2021 ML4H (Sik-Ho Tsang @ Medium)
Self-Supervised Learning, Image Classification, Medical Image Analysis
- Dynamic Visual Meta-Embedding (DVME)
1. Dynamic Visual Meta-Embedding (DVME)
- The embedding space is extracted from the last fully connected unit from SimCLR and SwAV with the dimension 2048.
- For DINO, the embedding is constructed by concatenating the class token of the last four blocks results in the dimension of 1536.
- Then, each embedding is projected into a fixed size of 512 and fed the concatenation of the resulting embedding into a self-attention module, which is the same as the ViT one, except that the attention is learned across different components of the meta-embedding instead of image patches.
- The embedding space from attention module is concatenated and projected to a fixed dimension of 512.
- The importance of each embedding component is learnt for a specific downstream task.
- There are 4 datasets. Each generate small (S), medium (M) and full datasets, for low data regime (S, M) evaluation.
2.2. Individual Self-Supervised Learning Approach
- SwAV and SimCLR pretrained features yield inconsistent patterns across all downstream tasks.
- DINO initialization consistently outperforms all the other initializations across all tasks by a significant margin.
- A higher performance for all self-supervised pretrained initializations compared to the supervised pretrained and randomly initialized baselines in the low data regimes.
This suggests that the representation generated by self-supervised methods are of higher quality, leading to better performance on the test set and reducing the performance variability between folds in low data regimes.
2.3. Proposed DVME
DVME outperforms this benchmark in 4/4 of the S subtasks, 3/4 of the M subtasks, and 2/4 F subtasks.
- The improvement of DVME over the benchmark is particularly pronounced for the APTOS and NIH Chest X-ray tasks. For example,
- DVME helps gain roughly 6% in Kappa score over the best individual baseline for the S and M subtask of the APTOS dataset.
2.4. t-SNE Visualization
- DINO offers a clear class separation compare to its supervised counterpart.
The DVME clusters are better separated, particularly in the case of multiclass classification.