Brief Review — DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation

Uses 2 U-Nets, ASPP in DeepLabv3, SE Block in SENet

4 min readApr 7, 2023

DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation,
DoubleU-Net, by SimulaMet, UiT The Arctic University of Norway, and Oslo Metropolitan University
2020 CBMS, Over 300 Citations (Sik-Ho Tsang @ Medium)
Biomedical Image Segmentation
2015 … 2022 [UNETR] [Half-UNet] [BUSIS] [RCA-IUNet] [Swin-Unet] 2023 [DCSAU-Net]
==== My Other Paper Readings Also Over Here ====

DoubleU-Net is proposed, which is a combination of two U-Net architectures stacked on top of each other.
The first U-Net uses a pre-trained VGG-19 as the encoder, which has already learned features from ImageNet and can be transferred to another task easily. To capture more semantic information efficiently, another U-Net is added at the bottom.
Atrous Spatial Pyramid Pooling (ASPP), as in DeepLabv3, and SE block, as in SENet, are adopted to capture contextual information within the network.

Outline

DoubleU-Net
Results

1. DoubleU-Net

1.1. Overall

DoubleU-Net starts with a VGG-19 (Yellow) as encoder sub-network.
Decoder block is marked in light green.
ASPP (Blue), as in DeepLabv3 is used.
The squeeze-and-excite block, as in SENet, is used in the encoder of NETWORK 1 and decoder blocks of NETWORK 1 and NETWORK 2.
(In the paper, authors also do not mention too much about DeepLabv3 and SENet. If interested, please feel free to read about their stories.)
In the NETWORK 1, the input image is fed to the modified U-Net, which generates a predicted mask (Output1).
An element-wise multiplication is performed between the output of NETWORK 1 (Output1) with the input of the same network.

1.2. Encoder

Each encoder block in the encoder2 performs two 3×3 convolution operation, each followed by a batch normalization. ReLU is used. After that, max-pooling is performed with a 2×2 window and stride 2 to reduce the spatial dimension of the feature maps.

1.3. Decoder

Each decoder block performs a 2×2 bi-linear up-sampling on the input feature, which doubles the dimension of the input feature maps. The appropriate skip connections (ResNet) concatenate feature maps from the encoder to the output feature maps.
In the second decoder, skip connections (ResNet) from both the encoders are used.
Two 3×3 convolution operation, each of which is followed by batch normalization and then by a ReLU activation function. After that, a squeeze and excitation block is used.

1.4. Output

At last, a convolution layer with a sigmoid is applied, which is used to generate the mask for the corresponding modified U-Net.

2. Results

2.1. Datasets

2.2. SOTA Comparisons

**2015 MICCAI Sub-Challenge on Automatic Polyp Detection Dataset**

DoubleU-Net achieved a DSC of 0.7649 and a mIoU of 0.6255, outperforms the baseline (Mask R-CNN with ResNet-101) by 6.07% in terms of DSC and 1.31% in mIoU.

DoubleU-Net achieve a DSC of 0.9239 which is 3.91% higher than Conditional GAN in [34] and mIoU of 0.8611, which is 1.14% higher than MultiResUNet.

DoubleU-Net achieve a DSC of 0.8962 and mIoU of 0.8212, outperforms U-Net by an approximate margin of 5.7%, and MultiResUNet by an approximate margin of 1.83% in terms of mIoU on Lesion boundary segmentation challenge dataset from ISIC-2018.

**2018 Data Science Bowl Challenge Dataset**

DoubleU-Net produced a DSC of 0.9133, which is 1.59% higher than UNet++ [, and comparable mIoU with U-Net and UNet++ that uses ResNet-101 as the backbone model.