Brief Review — An Improved U-Net Method for Sequence Images Segmentation

Improved U-Net, Using Multi-Scale Convolution

4 min readJan 7, 2023

An Improved U-Net Method for Sequence Images Segmentation,
Improved U-Net, by Guilin University of Electronic and Technology,
2019 ICACI (Sik-Ho Tsang @ Medium)
Image Segmentation, Semantic Segmentation, U-Net

An Improved U-Net is proposed where multi-scale convolution modules are added on the basis of U-Net structure to increase the network depth and improve feature extraction capability. The batch normalization (BN) layer is added to accelerate the speed of converged network.
A heat-map channel is added in the input data to prevent errors of classification in similar areas.

Outline

Improved U-Net
Results

1. Improved U-Net

1.1. Overall Architecture

The U-Net-like model is used as above with the use of Multiscale Convolution Module (Blocks with cross pattern).

1.2. Multiscale Convolution Module

The convolution kernel is fixed in size in the original U-Net.
If it is too small, the global information is lost.
If it is too large, the field of view is too large, so that the extracted features cannot obtain effective local information.
The Multiscale Convolution Module composed of a 3×3 convolution kernel, a 5×5 convolution kernel, and a multi-scale convolution layer composed of a maximum pooling layer, inspired by Inception Module in GoogLeNet.

1.3. Loss Function

The binary cross-entropy is used as the loss function:

1.4. Heat-Map Channel

Image saliency detection method is applied to generate heat map channel as input.
Particularly, the CA (Context-Aw) algorithm proposes a context-aware saliency measurement method that makes the color-dense region highly visible, while the region with low color density has a low significance value.
First, the distance between the patches is calculated by the following formula:

where i and k are two pixel points respectively, pi and pk are r pixel blocks around points i and k, c is constant 3, dcolor(pi, pk) is the color distance between the patches, and dposition(pi, pk) is the spatial distance between the patches.
The significant value at the single scale according to the patch distance is calculated:

The significant value on the multi-scale and take the mean value to get the final significant value:

where 4 scales of R = {1, 0.8, 0.5, 0.3} are used.

After adding the heat-map channel, it can effectively constrain the segmentation area of the network.

2. Results

The improved U-Net method gives a much better improvement in the edge of the target, producing a smoother edge and stronger generalization ability.

The improved U-Net is slightly slower than U-Net, but still can meet the actual demands for a far shorter time than is required for manual segmentation.

Reference

[2019 ICACI] [Improved U-Net]
An Improved U-Net Method for Sequence Images Segmentation

1.5. Semantic Segmentation / Scene Parsing

2015 … 2019 … [Improved U-Net] … 2020 [DRRN Zhang JNCA’20] [Trans10K, TransLab] [CCNet] 2021 [PVT, PVTv1] [SETR] [Trans10K-v2, Trans2Seg] 2022 [PVTv2]

Brief Review — An Improved U-Net Method for Sequence Images Segmentation

Improved U-Net, Using Multi-Scale Convolution

Outline

1. Improved U-Net

1.1. Overall Architecture

1.2. Multiscale Convolution Module

1.3. Loss Function

1.4. Heat-Map Channel

2. Results

Reference

1.5. Semantic Segmentation / Scene Parsing

==== My Other Previous Paper Readings ====

Written by Sik-Ho Tsang

No responses yet