Brief Review — An Improved U-Net Method for Sequence Images Segmentation
Improved U-Net, Using Multi-Scale Convolution
An Improved U-Net Method for Sequence Images Segmentation,
Improved U-Net, by Guilin University of Electronic and Technology,
2019 ICACI (Sik-Ho Tsang @ Medium)
Image Segmentation, Semantic Segmentation, U-Net
- An Improved U-Net is proposed where multi-scale convolution modules are added on the basis of U-Net structure to increase the network depth and improve feature extraction capability. The batch normalization (BN) layer is added to accelerate the speed of converged network.
- A heat-map channel is added in the input data to prevent errors of classification in similar areas.
Outline
- Improved U-Net
- Results
1. Improved U-Net
1.1. Overall Architecture
- The U-Net-like model is used as above with the use of Multiscale Convolution Module (Blocks with cross pattern).
1.2. Multiscale Convolution Module
- The convolution kernel is fixed in size in the original U-Net.
- If it is too small, the global information is lost.
- If it is too large, the field of view is too large, so that the extracted features cannot obtain effective local information.
- The Multiscale Convolution Module composed of a 3×3 convolution kernel, a 5×5 convolution kernel, and a multi-scale convolution layer composed of a maximum pooling layer, inspired by Inception Module in GoogLeNet.
1.3. Loss Function
- The binary cross-entropy is used as the loss function:
1.4. Heat-Map Channel
- Image saliency detection method is applied to generate heat map channel as input.
- Particularly, the CA (Context-Aw) algorithm proposes a context-aware saliency measurement method that makes the color-dense region highly visible, while the region with low color density has a low significance value.
- First, the distance between the patches is calculated by the following formula:
- where i and k are two pixel points respectively, pi and pk are r pixel blocks around points i and k, c is constant 3, dcolor(pi, pk) is the color distance between the patches, and dposition(pi, pk) is the spatial distance between the patches.
- The significant value at the single scale according to the patch distance is calculated:
- The significant value on the multi-scale and take the mean value to get the final significant value:
- where 4 scales of R = {1, 0.8, 0.5, 0.3} are used.
After adding the heat-map channel, it can effectively constrain the segmentation area of the network.
2. Results
The improved U-Net method gives a much better improvement in the edge of the target, producing a smoother edge and stronger generalization ability.
Reference
[2019 ICACI] [Improved U-Net]
An Improved U-Net Method for Sequence Images Segmentation
1.5. Semantic Segmentation / Scene Parsing
2015 … 2019 … [Improved U-Net] … 2020 [DRRN Zhang JNCA’20] [Trans10K, TransLab] [CCNet] 2021 [PVT, PVTv1] [SETR] [Trans10K-v2, Trans2Seg] 2022 [PVTv2]