# Brief Review — A Probabilistic U-Net for Segmentation of Ambiguous Images

## Probabilistic U-Net, Using **Conditional Variational Autoencoder (CVAE)**

--

A Probabilistic U-Net for Segmentation of Ambiguous Images,Probabilistic U-Net, by DeepMind, and German Cancer Research Center,2018 NeurIPS, Over 300 Citations(Sik-Ho Tsang @ Medium)

Semantic Segmentation, Image Segmentation, Medical Image Analysis, Medical Imaging

- A generative segmentation model based on
**a combination of a****U-Net****with a conditional variational autoencoder (CVAE)**that is capable of efficiently producing an unlimited number of plausible hypotheses.

# Outline

**Probabilistic****U-Net****Results**

**1. Probabilistic **U-Net

- The proposed network architecture is
**a combination of a conditional variational auto encoder (CVAE) with a****U-Net**.

**1.1. (a) Sampling**

- The central component of architecture is a
**low-dimensional latent space of size**(e.g.:*N*is the best).*N*=6**Each position in this space encodes a segmentation variant**. - The
**‘prior net’**, parametrized by**weights**,*ω***estimates the probability of these variants for a given input image**. This*X***prior probability distribution**(calledin the following) is*P***modelled**as an axis-aligned Gaussian with**mean prior**of size*μprior*(*X*;*ω*)*N*, and**variance**of size*σprior*(*X*;*ω*)*N*. - To predict a set of
, the network runs for*m*segmentationsto the*m*times**same input image**(only a small part of the network needs to be re-evaluated in each iteration).**In each iteration**:*i*(from 1 to*m*), a random sample*zi*is drawn from*P*

- Then,
is*zi***broadcasted**to an, and this feature map is*N*-channel feature map with the same shape as the segmentation map**concatenated to the last activation map of a****U-Net***θ*). A functioncomposed of*fcomb.***three subsequent 1×1 convolutions**(*ψ*being the set of their weights)**combines**the information and**maps**it to the**desired number of classes**. **The output,**, is the*Si***segmentation map corresponding to point**in the latent space:*zi*

- When drawing
*m*samples for the same input image, the output of the prior net and the feature activations of the U-Net are reused. Only the function*fcomb.*needs to be re-evaluated*m*times.

## 1.2. (b) Training

- A
**‘posterior net’**is introduced, parametrized by**weights**, to learn to recognize a segmentation variant (given the raw image*v**X*and the ground truth segmentation*Y*) and to**map this to a position**. The output is denoted as*μpost*(*X*;*Y*;*v*) with some uncertainty*σpost*(*X*;*Y*;*v*) in the latent space**posterior distribution**.*Q* **A sample**from this distribution:*z*

- The networks are trained with the standard training procedure for conditional VAEs, by
**minimizing the variational lower bound**:

where a

cross-entropy lossis used to penalize differences between the predicted segmentationSand the ground truth segmentationY.And there is a

Kullback-Leibler divergencewhichpenalizes differences between the posterior distributionDuring training, this KL loss “pulls” the posterior distribution (which encodes a segmentation variant) and the prior distribution towards each other.Qand the prior distributionP.

**2. Results**

## 2.1. Metric

- The
**generalized energy distance**, which leverages distances between observations, is used:

- where
*d*are independent samples from the*Y*and*Y*’**ground truth distribution**, and similarly,*Pgt**S*and*S*’**predicted distribution**.*Pout* is used.*d*(*x*,*y*)=1-IoU(*x, y*)

## 2.2. Baseline

**(a)****Dropout****U-Net**:**Incoming layers**of the three inner-most encoder and decoder blocks with a**Dropout****probability of***p*=0.5.**(b)****U-Net****Ensemble**:**Model ensemble**using U-Net.**(c) M-Heads**:*M*heads are**branched off after the last layer**of a deep net.**(d) Image2Image VAE**: employs a prior that is not conditioned on the input image (a fixed normal distribution) and a posterior net that is not conditioned on the input either.

## 2.3. Qualitative Results

## 2.4. Quantitative Results

**Left:**The**energy distance**on the 1992 images large lung abnormalities test set,**decreases for all models as more samples are drawn**.**The Probabilistic****U-Net****outperforms all baselines when sampling 4, 8 and 16 times.**The performance at 16 samples is found significantly higher than that of the baselines.**Right**: The Probabilistic U-Net on the Cityscapes task**outperforms the baseline methods when sampling 4, 8 and 16 times in terms of the energy distance.**

## Reference

[2018 NeurIPS] [Probabilistic U-Net]

A Probabilistic U-Net for Segmentation of Ambiguous Images

## 1.6. Semantic Segmentation / Scene Parsing

**2015** … **2018** [Probabilistic U-Net] … **2021** [PVT, PVTv1] [SETR] **2022 **[PVTv2]

## 4.2. Biomedical Image Segmentation

**2015 … 2018** [Probabilistic U-Net] … **2020** [MultiResUNet] [UNet 3+] [Dense-Gated U-Net (DGNet)]