Review — Big Self-Supervised Models Advance Medical Image Classification
MICLe, Using Multiple Images From Same Patient for Self-Supervised Learning
Big Self-Supervised Models Advance Medical Image Classification
MICLe, by Google Research and Health
2021 ICCV, Over 90 Citations (Sik-Ho Tsang @ Medium)
Self-Supervised Learning, Image Classification, Medical Image Classification
- Self-supervised learning on ImageNet, followed by additional
self-supervised learning on unlabeled domain-specific medical
images significantly improves the accuracy of medical
- A novel Multi-Instance Contrastive Learning (MICLe) method is proposed to use multiple images of the underlying pathology per patient case to construct more informative positive pairs for self-supervised learning.
- Proposed Pretraining Procedure
- Multi-Instance Contrastive Learning (MICLe)
- Experimental Results
- Learning from limited labeled data is a fundamental problem in machine learning, which is crucial for medical image analysis.
- Two common pretraining approaches to learning from limited labeled data include: (1) supervised pretraining on a large labeled dataset such as ImageNet, (2) self-supervised pretraining using contrastive learning on unlabeled data.
- Two distinct medical image classification tasks: (1) Dermatology skin condition classification from digital camera images, (2) multi-label chest X-ray classification among five pathologies based on the CheXpert dataset.
- It is observed that self-supervised pretraining outperforms supervised pretraining. This is attribute to the domain shift and discrepancy between the nature of recognition tasks in ImageNet and medical image classification.
Self-supervised approaches bridge this domain gap by leveraging in-domain medical data for pretraining.
2. Proposed Pretraining Procedure
2.1. Overall Procedure
- The proposed approach comprises three steps:
- Self-supervised pretraining on unlabeled ImageNet using SimCLR.
- Additional self-supervised pretraining using unlabeled medical images. If multiple images of each medical condition are available, a novel Multi-Instance Contrastive Learning (MICLe) is used to construct more informative positive pairs based on different images.
- Supervised fine-tuning on labeled medical images.
- SimCLR is used where it learns representations by maximizing agreement between differently augmented views of the same data example via a contrastive loss in a hidden representation of neural nets.
- It is used when only single image of a medical condition is available, a standard data augmentation is used to generate two augmented views of the same image.
- (Please feel free to read SimCLR if interested.)
3. Multi-Instance Contrastive Learning (MICLe)
- In medical image analysis, it is common to utilize multiple images per patient to improve classification accuracy and robustness.
- When multiple images are available, two distinct images are used to directly create a positive pair of examples.
Such images may be taken from different viewpoints or under different lighting conditions, providing complementary information for medical diagnosis.
- The learnt representations are invariant not only to different augmentations of the same image, but also to different images of the same medical pathology.
- MICLe does not require class label information and only relies on different images of an underlying pathology, the type of which may be unknown.
4. Experimental Results
4.1. Pretraining Approach Without MICLe
- Three possible scenarios are considered for self-supervised pretraining in the medical context:
- using ImageNet dataset only,
- using the task specific unlabeled medical dataset (i.e. Derm and CheXpert), and
- initializing the pretraining from ImageNet self-supervised model but using task specific unlabeled dataset for pretraining, here indicated as ImageNet→Derm and ImageNet→CheXpert.
The best performance are achieved when both ImageNet and task specific unlabeled data are used.
- Larger models are able to benefit much more.
4.2. SOTA Comparison on Derm
MICLe consistently improves the accuracy of skin condition classification over SimCLR on different datasets and architectures.
- The performance is further improved by providing more negative examples with training longer for 1000 epochs and a larger batch size of 1024.
The proposed self-supervised model significantly outperforms the supervised baseline when ImageNet pretraining is used.
4.3. SOTA Comparison on CheXpert
- No multiple images per patient in this dataset. MICLe cannot be used.
With additional in-domain unlabeled data (only use the CheXpert dataset for pretraining), self-supervised pretraining can surpass the BiT baseline by a larger margin.
4.4. Domain Shift
- For Dermatology dataset, an additional out-of-domain (OOD) dataset is used, which is primarily focused on skin cancers and the ground truth labels are obtained from biopsies.
- For CheXpert, the NIH (National Institutes of Health) chest X-ray dataset, i.e. ChestX-ray8, is used as OOD dataset, which consist of 112,120 de-identified X-rays from 30,805 unique patients.
- The model, which is post pretraining and end-to-end fine-tuning, (i.e. CheXpert and Derm) is used to make predictions on an additional shifted dataset without any further fine-tuning (zero-shot transfer learning).
- When only using ImageNet for self-supervised pretraining, the model performs worse in this setting compared to using in-domain data for pretraining.
The results generally suggest that self-supervised pretrained models can generalize better to distribution shifts.
- The performance improvement in the distribution-shifted dataset due to self-supervised pretraining (both using ImageNet and CheXpert data) is more pronounced than the original improvement on the CheXpert dataset.
This is a very valuable finding, as generalisation under distribution shift is of paramount importance to clinical applications.
4.5. Label Efficiency
- The label fractions ranging from 10% to 90% for both Derm and CheXpert training datasets.
- Pretraining using self-supervised models can significantly help with label efficiency for medical image classification.
MICLe yields proportionally larger gains when fine-tuning with fewer labeled examples.
[2021 ICCV] [MICLe]
Big Self-Supervised Models Advance Medical Image Classification
[Google AI Blog]
1993 … 2020 [CMC] [MoCo] [CPCv2] [PIRL] [SimCLR] [MoCo v2] [iGPT] [BoWNet] [BYOL] [SimCLRv2] [BYOL+GN+WS] 2021 [MoCo v3] [SimSiam] [DINO] [Exemplar-v1, Exemplar-v2] [MICLe]
Biomedical Image Classification
2019 [CheXpert] 2020 [VGGNet for COVID-19] [Dermatology] 2021 [MICLe]