Review: UNet++ — A Nested U-Net Architecture (Biomedical Image Segmentation)

Outperforms U-Net and Wide U-Net

5 min readOct 1, 2019

In this story, UNet++, by Arizona State University, is reviewed. UNet++ uses the Dense block ideas from DenseNet to improve U-Net. UNet++ differs from the original U-Net in three ways:

1) having convolution layers on skip pathways, which bridges the semantic gap between encoder and decoder feature maps.
2) having dense skip connections on skip pathways, which improves gradient flow.
3) having deep supervision, which enables model pruning and improves or in the worst case achieves comparable performance to using only one loss layer.

This is a 2018 DLMIA paper with more than 40 citations. (Sik-Ho Tsang @ Medium)

Outline

UNet++ Architecture
Re-designed Skip Pathways
Deep Supervision
Experimental Results

1. UNet++ Architecture

UNet++ starts with an encoder sub-network or backbone followed by a decoder sub-network.
There are re-designed skip pathways (green and blue) that connect the two sub-networks and the use of deep supervision (red).

2. Re-designed Skip Pathways

The above figure shows an example how the feature maps travel through the top skip pathway of UNet++.
Another example, consider the skip pathway between nodes X0,0 and X1,3, as shown in the first figure. The skip pathway consists of a dense convolution block with three convolution layers.
Each convolution layer is preceded by a concatenation layer that fuses the output from the previous convolution layer of the same dense block with the corresponding up-sampled output of the lower dense block.
Formally, we can formulate as follows:

where H() is a convolution operation followed by an activation function, U() denotes an up-sampling layer, and [ ] denotes the concatenation layer.
This is the idea from DenseNet.

The main idea behind is to bridge the semantic gap between the feature maps of the encoder and decoder prior to fusion.

3. Deep Supervision

With deep supervision:

accurate mode wherein the outputs from all segmentation branches are averaged.
Or fast mode wherein the nal segmentation map is selected from only one of the segmentation branches, the choice of which determines the extent of model pruning and speed gain.

Owing to the nested skip pathways, UNet++ generates full resolution feature maps at multiple semantic levels. Thus, the loss are estimated from 4 semantic levels.
Also, a combination of binary cross-entropy and dice coefficient as the loss function:

where N is the batch size.

4. Experimental Results

4.1. Datasets

Four medical imaging datasets are used for model evaluation, covering lesions/organs from different medical imaging modalities.

4.2. Baseline Models

Original U-Net and Wide U-Net are compared.
Wide U-Net is the modified U-Net with more kernels such that it has similar number of parameters with the UNet++.

4.3. Results

UNet++ without deep supervision achieves a significant performance gain over both U-Net and wide U-Net, yielding average improvement of 2.8 and 3.3 points in IoU.
UNet++ with deep supervision exhibits average improvement of 0.6 points over UNet++ without deep supervision.

4.4. Model Pruning

**mIoU vs Inference Time for Model Pruning**

UNet++ L3 achieves on average 32.2% reduction in inference time while degrading IoU by only 0.6 points.
More aggressive pruning further reduces the inference time but at the cost of significant accuracy degradation.

4.5. Qualitative Results

Around from 2017 to 2018 after DenseNet, there are papers borrowed the DenseNet idea to improve the segmentation accuracy in Biomedical Image Segmentation including this paper and DenseVoxNet.