Brief Review — DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation

Uses 2 U-Nets, ASPP in DeepLabv3, SE Block in SENet

Sik-Ho Tsang
4 min readApr 7


DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation,
DoubleU-Net, by SimulaMet, UiT The Arctic University of Norway, and Oslo Metropolitan University
2020 CBMS, Over 300 Citations (Sik-Ho Tsang @ Medium)

Biomedical Image Segmentation
2015 … 2022
[UNETR] [Half-UNet] [BUSIS] [RCA-IUNet] [Swin-Unet] 2023 [DCSAU-Net]
==== My Other Paper Readings Also Over Here ====

  • DoubleU-Net is proposed, which is a combination of two U-Net architectures stacked on top of each other.
  • The first U-Net uses a pre-trained VGG-19 as the encoder, which has already learned features from ImageNet and can be transferred to another task easily. To capture more semantic information efficiently, another U-Net is added at the bottom.
  • Atrous Spatial Pyramid Pooling (ASPP), as in DeepLabv3, and SE block, as in SENet, are adopted to capture contextual information within the network.


  1. DoubleU-Net
  2. Results

1. DoubleU-Net


1.1. Overall

  • DoubleU-Net starts with a VGG-19 (Yellow) as encoder sub-network.
  • Decoder block is marked in light green.
  • ASPP (Blue), as in DeepLabv3 is used.
  • The squeeze-and-excite block, as in SENet, is used in the encoder of NETWORK 1 and decoder blocks of NETWORK 1 and NETWORK 2.
  • (In the paper, authors also do not mention too much about DeepLabv3 and SENet. If interested, please feel free to read about their stories.)
  • In the NETWORK 1, the input image is fed to the modified U-Net, which generates a predicted mask (Output1).
  • An element-wise multiplication is performed between the output of NETWORK 1 (Output1) with the input of the same network.

1.2. Encoder

  • Each encoder block in the encoder2 performs two 3×3 convolution operation, each followed by a batch normalization. ReLU is used. After that, max-pooling is performed with a 2×2 window and stride 2 to reduce the spatial dimension of the feature maps.

1.3. Decoder

  • Each decoder block performs a 2×2 bi-linear up-sampling on the input feature, which doubles the dimension of the input feature maps. The appropriate skip connections (ResNet) concatenate feature maps from the encoder to the output feature maps.
  • In the second decoder, skip connections (ResNet) from both the encoders are used.
  • Two 3×3 convolution operation, each of which is followed by batch normalization and then by a ReLU activation function. After that, a squeeze and excitation block is used.

1.4. Output

  • At last, a convolution layer with a sigmoid is applied, which is used to generate the mask for the corresponding modified U-Net.

2. Results

2.1. Datasets


2.2. SOTA Comparisons

2015 MICCAI Sub-Challenge on Automatic Polyp Detection Dataset

DoubleU-Net achieved a DSC of 0.7649 and a mIoU of 0.6255, outperforms the baseline (Mask R-CNN with ResNet-101) by 6.07% in terms of DSC and 1.31% in mIoU.


DoubleU-Net achieve a DSC of 0.9239 which is 3.91% higher than Conditional GAN in [34] and mIoU of 0.8611, which is 1.14% higher than MultiResUNet.


DoubleU-Net achieve a DSC of 0.8962 and mIoU of 0.8212, outperforms U-Net by an approximate margin of 5.7%, and MultiResUNet by an approximate margin of 1.83% in terms of mIoU on Lesion boundary segmentation challenge dataset from ISIC-2018.

2018 Data Science Bowl Challenge Dataset

DoubleU-Net produced a DSC of 0.9133, which is 1.59% higher than UNet++ [, and comparable mIoU with U-Net and UNet++ that uses ResNet-101 as the backbone model.

2.3. Comparisons with U-Net

Relative Improvements Over U-Net

DoubleUNet performs reasonably well as compared to U-Net for all the presented datasets.

2.4. Visualizations

2015 MICCAI Sub-Challenge on Automatic Polyp Detection Dataset

The segmentation mask produced by Output2 is better than that of Output1.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.