Brief Review — DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation
DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation,
DoubleU-Net, by SimulaMet, UiT The Arctic University of Norway, and Oslo Metropolitan University
2020 CBMS, Over 300 Citations (Sik-Ho Tsang @ Medium)Biomedical Image Segmentation
2015 … 2022 [UNETR] [Half-UNet] [BUSIS] [RCA-IUNet] [Swin-Unet] 2023 [DCSAU-Net]
==== My Other Paper Readings Also Over Here ====
- DoubleU-Net is proposed, which is a combination of two U-Net architectures stacked on top of each other.
- The first U-Net uses a pre-trained VGG-19 as the encoder, which has already learned features from ImageNet and can be transferred to another task easily. To capture more semantic information efficiently, another U-Net is added at the bottom.
- Atrous Spatial Pyramid Pooling (ASPP), as in DeepLabv3, and SE block, as in SENet, are adopted to capture contextual information within the network.
Outline
- DoubleU-Net
- Results
1. DoubleU-Net
1.1. Overall
- DoubleU-Net starts with a VGG-19 (Yellow) as encoder sub-network.
- Decoder block is marked in light green.
- ASPP (Blue), as in DeepLabv3 is used.
- The squeeze-and-excite block, as in SENet, is used in the encoder of NETWORK 1 and decoder blocks of NETWORK 1 and NETWORK 2.
- (In the paper, authors also do not mention too much about DeepLabv3 and SENet. If interested, please feel free to read about their stories.)
- In the NETWORK 1, the input image is fed to the modified U-Net, which generates a predicted mask (Output1).
- An element-wise multiplication is performed between the output of NETWORK 1 (Output1) with the input of the same network.
1.2. Encoder
- Each encoder block in the encoder2 performs two 3×3 convolution operation, each followed by a batch normalization. ReLU is used. After that, max-pooling is performed with a 2×2 window and stride 2 to reduce the spatial dimension of the feature maps.
1.3. Decoder
- Each decoder block performs a 2×2 bi-linear up-sampling on the input feature, which doubles the dimension of the input feature maps. The appropriate skip connections (ResNet) concatenate feature maps from the encoder to the output feature maps.
- In the second decoder, skip connections (ResNet) from both the encoders are used.
- Two 3×3 convolution operation, each of which is followed by batch normalization and then by a ReLU activation function. After that, a squeeze and excitation block is used.
1.4. Output
- At last, a convolution layer with a sigmoid is applied, which is used to generate the mask for the corresponding modified U-Net.
2. Results
2.1. Datasets
2.2. SOTA Comparisons
DoubleU-Net achieved a DSC of 0.7649 and a mIoU of 0.6255, outperforms the baseline (Mask R-CNN with ResNet-101) by 6.07% in terms of DSC and 1.31% in mIoU.
DoubleU-Net achieve a DSC of 0.9239 which is 3.91% higher than Conditional GAN in [34] and mIoU of 0.8611, which is 1.14% higher than MultiResUNet.
DoubleU-Net achieve a DSC of 0.8962 and mIoU of 0.8212, outperforms U-Net by an approximate margin of 5.7%, and MultiResUNet by an approximate margin of 1.83% in terms of mIoU on Lesion boundary segmentation challenge dataset from ISIC-2018.
DoubleU-Net produced a DSC of 0.9133, which is 1.59% higher than UNet++ [, and comparable mIoU with U-Net and UNet++ that uses ResNet-101 as the backbone model.
2.3. Comparisons with U-Net
DoubleUNet performs reasonably well as compared to U-Net for all the presented datasets.
2.4. Visualizations
The segmentation mask produced by Output2 is better than that of Output1.