# Brief Review — A Probabilistic U-Net for Segmentation of Ambiguous Images

## Probabilistic U-Net, Using Conditional Variational Autoencoder (CVAE)

• A generative segmentation model based on a combination of a U-Net with a conditional variational autoencoder (CVAE) that is capable of efficiently producing an unlimited number of plausible hypotheses.

# Outline

1. Probabilistic U-Net
2. Results

# 1. Probabilistic U-Net

• The proposed network architecture is a combination of a conditional variational auto encoder (CVAE) with a U-Net.

## 1.1. (a) Sampling

• The central component of architecture is a low-dimensional latent space of size N (e.g.: N=6 is the best). Each position in this space encodes a segmentation variant.
• The ‘prior net’, parametrized by weights ω, estimates the probability of these variants for a given input image X. This prior probability distribution (called P in the following) is modelled as an axis-aligned Gaussian with mean prior μprior(X; ω) of size N, and variance σprior(X; ω) of size N.
• To predict a set of m segmentations, the network runs for m times to the same input image (only a small part of the network needs to be re-evaluated in each iteration). In each iteration i (from 1 to m), a random sample zi is drawn from P:
• Then, zi is broadcasted to an N-channel feature map with the same shape as the segmentation map, and this feature map is concatenated to the last activation map of a U-Net (the U-Net is parameterized by weights θ). A function fcomb. composed of three subsequent 1×1 convolutions (ψ being the set of their weights) combines the information and maps it to the desired number of classes.
• The output, Si, is the segmentation map corresponding to point zi in the latent space:
• When drawing m samples for the same input image, the output of the prior net and the feature activations of the U-Net are reused. Only the function fcomb. needs to be re-evaluated m times.

## 1.2. (b) Training

• A ‘posterior net’ is introduced, parametrized by weights v, to learn to recognize a segmentation variant (given the raw image X and the ground truth segmentation Y) and to map this to a position μpost(X; Y; v) with some uncertainty σpost(X; Y; v) in the latent space. The output is denoted as posterior distribution Q.
• A sample z from this distribution:
• The networks are trained with the standard training procedure for conditional VAEs, by minimizing the variational lower bound:

# 2. Results

## 2.1. Metric

• The generalized energy distance, which leverages distances between observations, is used:
• where d is a distance measure, Y and Y are independent samples from the ground truth distribution Pgt, and similarly, S and S are independent samples from the predicted distribution Pout.
• d(x, y)=1-IoU(x, y) is used.

## 2.2. Baseline

• (a) Dropout U-Net: Incoming layers of the three inner-most encoder and decoder blocks with a Dropout probability of p=0.5.
• (b) U-Net Ensemble: Model ensemble using U-Net.
• (c) M-Heads: M heads are branched off after the last layer of a deep net.
• (d) Image2Image VAE: employs a prior that is not conditioned on the input image (a fixed normal distribution) and a posterior net that is not conditioned on the input either.

## 2.4. Quantitative Results

• Left: The energy distance on the 1992 images large lung abnormalities test set, decreases for all models as more samples are drawn.
• The Probabilistic U-Net outperforms all baselines when sampling 4, 8 and 16 times. The performance at 16 samples is found significantly higher than that of the baselines.
• Right: The Probabilistic U-Net on the Cityscapes task outperforms the baseline methods when sampling 4, 8 and 16 times in terms of the energy distance.

## Reference

[2018 NeurIPS] [Probabilistic U-Net]
A Probabilistic U-Net for Segmentation of Ambiguous Images

## 1.6. Semantic Segmentation / Scene Parsing

20152018 [Probabilistic U-Net] … 2021 [PVT, PVTv1] [SETR] 2022 [PVTv2]

## 4.2. Biomedical Image Segmentation

2015 … 2018 [Probabilistic U-Net] … 2020 [MultiResUNet] [UNet 3+] [Dense-Gated U-Net (DGNet)]

--

--