Review — Uncertainty-Aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation

UA+MT, Semi-Supervised Segmentation Using Teacher-Student Paradigm

Sik-Ho Tsang
6 min readDec 1, 2022


Uncertainty-Aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation, UA+MT, by The Chinese University of Hong Kong,
2019 MICCAI, Over 300 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Semi-Supervised Learning, Image Segmentation, V-Net

  • A novel uncertainty-aware semi-supervised framework is proposed for left atrium segmentation from 3D MR images.
  • The framework consists of a student model and a teacher model, and the student model learns from the teacher model by minimizing a segmentation loss and a consistency loss with respect to the targets of the teacher model.


  1. Semi-Supervised Segmentation
  2. Uncertainty-Aware Mean Teacher Framework (UA-MT)
  3. Experimental Results

1. Semi-Supervised Segmentation

Uncertainty-aware self-ensembling mean teacher framework (UA-MT) for semi-supervised LA segmentation.

1.1. Definitions

  • Let say we have the 3D data, where the training set consists of N labeled data and M unlabeled data, called DL and DU respectively:
  • where xi of the size H×W×D is the input volume and yi ∈ {0, 1} of the size H×W×D is the ground-truth annotations.

1.2. Loss Functions

  • The goal of semi-supervised segmentation framework is to minimize the following combined objective function:
  • where Ls denotes the supervised loss (e.g., cross-entropy loss) to evaluate the quality of the network output on labeled inputs, and
  • Lc represents the unsupervised consistency loss for measuring the consistency between the prediction of the teacher model and the student model for the same input xi under different perturbations.
  • Here, f(·) denotes the segmentation neural network; (θ’, ξ’) and (θ, ξ) represents the weights and different perturbation operations (e.g., adding noise to input and network Dropout) of the teacher and student models.
  • λ is an ramp-up weighting coefficient that controls the trade-off between the supervised and unsupervised loss:
  • At the beginning, when the model is not well trained, λ is small such that the above loss function mainly depends on supervised loss.
  • As the training continues (where t is the training step), λ becomes larger such that loss function is a combination of supervised loss and consistency loss.
  • The teacher’s weights θ’ as an exponential moving average (EMA) of the student’s weights θ to ensemble the information in different training step:
  • where α is the EMA decay.

2. Uncertainty-Aware Mean Teacher Framework (UA-MT)

2.1. Uncertainty Estimation

  • T stochastic forward passes on the teacher model under random Dropout and input Gaussian noise for each input volume. Therefore, for each voxel in the input, we obtain a set of softmax probability vector:
  • The predictive entropy is:
  • where pct is the probability of the c-th class in the t-th time prediction.
  • The uncertainty is estimated in voxel level and the uncertainty of the whole volume U is:

2.2. Uncertainty-Aware Consistency Loss

  • The uncertainty-aware consistency loss Lc as the voxel-level mean squared error (MSE) loss of the teacher and student models only for the most certainty predictions:
  • where I(·) is the indicator function; fv and fv are the predictions of teacher model and student model at the v-th voxel, respectively.
  • uv is the estimated uncertainty U at the v-th voxel; and H is a threshold to select the most certain targets.

With the proposed uncertainty-aware consistency loss in the training procedure, both the student and teacher can learn more reliable knowledge, which can then reduce the overall uncertainty of the model.

2.3. Model Architecture

  • V-Net is used as the network backbone. The short residual connection is removed in each convolution block, and a joint cross-entropy loss and dice loss are used.
  • To adapt the V-Net as a Bayesian network to estimate the uncertainty, two Dropout layers with Dropout rate 0.5 are added after the L-Stage 5 layer and R-Stage 1 layer of the V-Net.

3. Experimental Results

3.1. Dataset

  • Atrial Segmentation Challenge dataset is used. It provides 100 3D gadolinium-enhanced MR imaging scans (GE-MRIs) and LA segmentation mask for training and validation.
  • These scans have an isotropic resolution of 0.625×0.625×0.625mm³. The 100 scans are split into 80 scans for training and 20 scans for evaluation. All the scans were cropped centering at the heart region for better comparison of the segmentation performance of different methods.

3.2. SOTA Comparisons

Comparison between the proposed method and various methods
  • The above table shows the segmentation performance of V-Net trained with only the labeled data (the first two rows) and the proposed semi-supervised method (UA-MT) on the testing dataset.
  • The fully supervised V-Net with all 80 labeled scans is evaluated as upper bound (3rd to 4th rows).
  • Compared with the Vanilla V-Net, adding Dropout (Bayesian V-Net) improves the segmentation performance, and achieves an average Dice of 86.03% and Jaccard of 76.06% with only the labeled training data.

By utilizing the unlabeled data, the semi-supervised framework further improves the segmentation by 4.15% Jaccrad and 2.85% Dice.

  • Compared with the self-training method, the DAN and ASDNet improve by 0.60% and 0.98% Dice, respectively, showing the effect of adversarial learning in semi-supervised learning. The ASDNet is better than DAN, since it selects the trustworthy region of unlabeled data for training the segmentation network.
  • The self-ensembling-based methods TCSE achieve slightly better performance than ASDNet, demonstrating that perturbation-based consistency loss is helpful for the semi-supervised segmentation problem.

Notably, the proposed method (UA-MT) achieves the best performance over the state-of-the-art semi-supervised methods, except that the ASD performance is comparable with ASDNet.

3.3. Analyses

Quantitative analysis of the proposed method.

The proposed uncertainty-aware method outperforms both the MT model and MT-Dice model.

Visualization of the segmentations by different methods and the uncertainty.
  • Compared with the supervised method, the proposed results have higher overlap ratio with the ground truth (the second row) and produce less false positives (the first row).
  • As shown in (d), the network estimates high uncertainty near the boundary and ambiguous regions of great vessels.


[2019 MICCAI] [UA+MT]
Uncertainty-Aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation

Biomedical Image Semi-Supervised Learning

2019 [UA+MT]

My Other Previous Readings



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.