Review — Half-UNet: A Simplified U-Net Architecture for Medical Image Segmentation
Half-UNet, With the Use of GhostNet Ghost Module
Half-UNet: A Simplified U-Net Architecture for Medical Image Segmentation,
Half-UNet, by South-Central Minzu University for Nationalities, and Hubei Provincial Engineering Research Center for Intelligent Management of Manufacturing Enterprises,
2022 J. Front. Neuroinform., Over 5 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation, U-NetBiomedical Image Segmentation
2015 … 2021 [Expanded U-Net] [3-D RU-Net] [nnU-Net] [TransUNet] [CoTr] [TransBTS] [Swin-Unet] [Swin UNETR] [RCU-Net] [IBA-U-Net] [PRDNet] [Up-Net] 2022 [UNETR]
==== My Other Paper Readings Also Over Here ====
Outline
- Motivations
- Half-UNet
- Results
1. Motivations
- U-Net’s encoder and decoder are considered as encoders. Then, the features from C1 to C16 are aggregated by designing a single decoder, where the structure is the same as full-scale feature aggregation in UNet 3+.
- The encoder (A) can achieve comparable performance with the encoder (C), while the performance obviously drops in the encoder (B).
The U-Net’s decoder can be simplified to reduce the complexity.
2. Half-UNet
2.1. Unify the Channel Numbers
- In each downsampling step of U-Net and UNet 3+, the number of feature channels is doubled, which enhances the diversity of feature expression. However, this increases the complexity of the model, especially in UNet 3+.
In Half-UNet, on the other hand, the channel numbers of all feature maps are unified, which reduces the number of filters in the convolution operation.
2.2. Full-Scale Feature Fusion
- Both U-Net and UNet 3+ use concatenate operations for feature fusion, which require more memory overhead and computation.
- The addition operation does not require additional parameters or computational complexity.
Feature maps from different scales are first upsampled to the size of the original image, and then feature fusion is performed through the addition operation.
2.3. Ghost Module
- Ghost module, as in GhostNet, is used to generate more feature maps while using cheap operations.
- s=2 is used where s represents the reciprocal of the proportion of intrinsic feature maps.
- Half of the feature maps are generated by convolution, and the other half are generated by depthwise separable convolution.
- Finally, the two halves of the feature map are concatenated to form the output.
The Ghost module is used in Half-UNet to reduce the required parameters and FLOPs compared to standard convolution.
3. Results
3.1. Datasets
- Three datasets are used for experiments.
3.2. Quantitative Results
- Half-UNet†: remove the Ghost modules in Half-UNet.
- Half-UNet† outperforms U-Net and its variants in regard to mammography images and is closer to them than Half-UNet in terms of lung nodule images. Yet, Half-UNet† performed less well than Half-UNet for left ventricular MRI images.
Half-UNet (with and without Ghost modules) has similar segmentation accuracy compared with U-Net and its variants, while the parameters and FLOPs are reduced by 98.6 and 81.8%.
- The channel numbers of Half-UNet∗†_u and Half-UNet∗†_d are doubled after downsampling.
- There are two strategies for feature fusion in the decoder: (1) Upsampling2D + 3×3 convolution, which is what Half-UNet∗†_u and UNet 3+ do; (2) Deconvolution, which is what Half-UNet∗†_d and U-Net do.
Half-UNet∗†_u and Half-UNet∗†_d increase the required FLOPs and parameters, respectively, compared with Half-UNet†.
3.2. Qualitative Results
Half-UNet can segment endocardial and epicardial boundaries more completely.
3.3. Further Study
- In the left part of the Half-UNet sub-network, since bilinear upsampling and addition are both linear operations, almost no parameters and computation are generated.
- In the right part of the Half-UNet sub-network, due to the lower number of input channels (only 64) and the use of the Ghost module, the cost of convolution is significantly smaller than in other structures.
Half-UNet avoids the problems of the above three networks, significantly reducing the required parameters and FLOPs.