Brief Review — FocalMix: Semi-Supervised Learning for 3D Medical Image Detection
FocalMix: Semi-Supervised Learning for 3D Medical Image Detection
FocalMix, by Peking University, and Yizhun Medical AI Co., Ltd,
2020 CVPR, Over 100 Citations (Sik-Ho Tsang @ Medium)
Biomedical Image Semi-Supervised Learning
2019 [UA+MT] 2020 [SASSNet]
==== My Other Paper Readings Are Also Over Here ====
- FocalMix is proposed, which is the first to leverage recent advances in semi-supervised learning (SSL) for 3D medical image detection.
1.1. Example of Medical Image Detector
1.2. Focal Loss for Imbalance Dataset
- (Please feel free to read about RetinaNet if interested.)
1.3. MixMatch for Semi-Supervised Learning
- MixMatch consists of two major components, target prediction for unlabeled data and mixup augmentation. MixMatch uses the average ensemble of predictions by the current model parameterized by θ on K augmented instances:
- Then, these guessed labels are further transformed by a sharpening operator before used as training targets:
- The sharpening operation implicitly enforces the model to output low-entropy predictions on unlabeled data.
- (Please feel free to read about MixMatch if interested.)
1.4. mixup for Augmentation
- mixup augmentation produces a stochastic linear interpolation with another training example (x′, y′), either labeled or unlabeled:
- Following the recommendations in Oliver NeurIPS’18, the exact same model is used, a 3D variant of FPN, as both the fully-supervised baseline and the base model for FocalMix.
- Two essential components in the MixMatch framework are tailored specifically for lesion detection tasks: target prediction and mixup augmentation.
2.1. Soft-Target Focal Loss
- With the use of MixMatch, using focal loss amounts to having a skewed distribution of soft labels.
- The proposed soft-target focal loss for SSL is designed:
- where CE loss is:
- As we can see, focal loss is a special case of the proposed soft-target focal loss.
2.2. mixup Augmentation for Detection
- Directly applying mixup for bounding boxes is not applicable.
- Image-Level mixup is used such that mixup training signals are at the anchor level. Anchor-to-anchor mixup requires the model to be able to detect lesions that are mixed with stronger background noises than usual, analogous to the idea of “altitude training”.
- Object-level mixup is also applied to generate extra object instances by mixing up different lesion patterns within each training batch.
When 25 labeled images are used, the fully-supervised model can only obtain a CPM score of 66.6%, whereas FocalMix boosts it to 78.1% with a 17.3% relative improvement.
CPM score consistently grows as the amount of unlabeled data increases, which proves the effectiveness of using unlabeled data in FocalMix.
3.2. Ablation Study
SFL is the best loss function. K=4 is the best. With both image-level and object-level mixup, CPM is the best.
Intuitively, the goal of image-level mixup is to encourage models to perform linearly between foreground and background, while object-level mixup encourages models to detect lesions with richer patterns.
- All the models are trained for 400 epochs. When using all the 533 annotated CT scans, the proposed mixup strategies (i.e., anchor-level and object-level mixup) alone can improve the CPM score of the fully-supervised learning approach from 89.2% to 90.0%.
FocalMix further improves this result to 90.7% by leveraging around 3,000 images without annotation.