Brief Review — FocalMix: Semi-Supervised Learning for 3D Medical Image Detection


Sik-Ho Tsang
4 min readJul 19, 2023

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection
, by Peking University, and Yizhun Medical AI Co., Ltd,
2020 CVPR, Over 100 Citations (Sik-Ho Tsang @ Medium)

Biomedical Image Semi-Supervised Learning
2019 [UA+MT] 2020 [SASSNet]
==== My Other Paper Readings Are Also Over Here ====

  • FocalMix is proposed, which is the first to leverage recent advances in semi-supervised learning (SSL) for 3D medical image detection.


  1. Preliminaries
  2. FocalMix
  3. Results

1. Preliminaries

1.1. Example of Medical Image Detector

Example of Medical Image Detection

1.2. Focal Loss for Imbalance Dataset

  • (Please feel free to read about RetinaNet if interested.)

1.3. MixMatch for Semi-Supervised Learning

  • MixMatch consists of two major components, target prediction for unlabeled data and mixup augmentation. MixMatch uses the average ensemble of predictions by the current model parameterized by θ on K augmented instances:
  • Then, these guessed labels are further transformed by a sharpening operator before used as training targets:
  • The sharpening operation implicitly enforces the model to output low-entropy predictions on unlabeled data.
  • (Please feel free to read about MixMatch if interested.)

1.4. mixup for Augmentation

  • mixup augmentation produces a stochastic linear interpolation with another training example (x′, y′), either labeled or unlabeled:
  • With mixup, two images x and x’ are mixed together as ^x. The corresponding image labels y and y’ are mixed as ^y. (Please read mixup if interested.)

2. FocalMix

FocalMix Overview
  • Following the recommendations in Oliver NeurIPS’18, the exact same model is used, a 3D variant of FPN, as both the fully-supervised baseline and the base model for FocalMix.
  • Two essential components in the MixMatch framework are tailored specifically for lesion detection tasks: target prediction and mixup augmentation.

2.1. Soft-Target Focal Loss

  • With the use of MixMatch, using focal loss amounts to having a skewed distribution of soft labels.
  • The proposed soft-target focal loss for SSL is designed:
  • where CE loss is:
  • As we can see, focal loss is a special case of the proposed soft-target focal loss.

2.2. mixup Augmentation for Detection

  • Directly applying mixup for bounding boxes is not applicable.
  • Image-Level mixup is used such that mixup training signals are at the anchor level. Anchor-to-anchor mixup requires the model to be able to detect lesions that are mixed with stronger background noises than usual, analogous to the idea of “altitude training”.
  • Object-level mixup is also applied to generate extra object instances by mixing up different lesion patterns within each training batch.

3. Results

3.1. LUNA16


When 25 labeled images are used, the fully-supervised model can only obtain a CPM score of 66.6%, whereas FocalMix boosts it to 78.1% with a 17.3% relative improvement.


CPM score consistently grows as the amount of unlabeled data increases, which proves the effectiveness of using unlabeled data in FocalMix.

3.2. Ablation Study

Ablation Study

SFL is the best loss function. K=4 is the best. With both image-level and object-level mixup, CPM is the best.

Examples of mixup

Intuitively, the goal of image-level mixup is to encourage models to perform linearly between foreground and background, while object-level mixup encourages models to detect lesions with richer patterns.

  • All the models are trained for 400 epochs. When using all the 533 annotated CT scans, the proposed mixup strategies (i.e., anchor-level and object-level mixup) alone can improve the CPM score of the fully-supervised learning approach from 89.2% to 90.0%.

FocalMix further improves this result to 90.7% by leveraging around 3,000 images without annotation.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.