Brief Review — An Improved U-Net Method for Sequence Images Segmentation

Improved U-Net, Using Multi-Scale Convolution

Sik-Ho Tsang
4 min readJan 7, 2023

An Improved U-Net Method for Sequence Images Segmentation,
Improved U-Net, by Guilin University of Electronic and Technology,
2019 ICACI (

@ Medium)
Image Segmentation, Semantic Segmentation, U-Net

  • An Improved U-Net is proposed where multi-scale convolution modules are added on the basis of U-Net structure to increase the network depth and improve feature extraction capability. The batch normalization (BN) layer is added to accelerate the speed of converged network.
  • A heat-map channel is added in the input data to prevent errors of classification in similar areas.


  1. Improved U-Net
  2. Results

1. Improved U-Net

1.1. Overall Architecture

Improved U-Net Structure
  • The U-Net-like model is used as above with the use of Multiscale Convolution Module (Blocks with cross pattern).

1.2. Multiscale Convolution Module

Multiscale Convolution Module
  • The convolution kernel is fixed in size in the original U-Net.
  • If it is too small, the global information is lost.
  • If it is too large, the field of view is too large, so that the extracted features cannot obtain effective local information.
  • The Multiscale Convolution Module composed of a 3×3 convolution kernel, a 5×5 convolution kernel, and a multi-scale convolution layer composed of a maximum pooling layer, inspired by Inception Module in GoogLeNet.

1.3. Loss Function

  • The binary cross-entropy is used as the loss function:

1.4. Heat-Map Channel

Heat-map channel
  • Image saliency detection method is applied to generate heat map channel as input.
  • Particularly, the CA (Context-Aw) algorithm proposes a context-aware saliency measurement method that makes the color-dense region highly visible, while the region with low color density has a low significance value.
  • First, the distance between the patches is calculated by the following formula:
  • where i and k are two pixel points respectively, pi and pk are r pixel blocks around points i and k, c is constant 3, dcolor(pi, pk) is the color distance between the patches, and dposition(pi, pk) is the spatial distance between the patches.
  • The significant value at the single scale according to the patch distance is calculated:
  • The significant value on the multi-scale and take the mean value to get the final significant value:
  • where 4 scales of R = {1, 0.8, 0.5, 0.3} are used.

After adding the heat-map channel, it can effectively constrain the segmentation area of the network.

2. Results

Qualitative Results
SOTA Comparison

The improved U-Net method gives a much better improvement in the edge of the target, producing a smoother edge and stronger generalization ability.

  • The improved U-Net is slightly slower than U-Net, but still can meet the actual demands for a far shorter time than is required for manual segmentation.


[2019 ICACI] [Improved U-Net]
An Improved U-Net Method for Sequence Images Segmentation

1.5. Semantic Segmentation / Scene Parsing

20152019 … [Improved U-Net] … 2020 [DRRN Zhang JNCA’20] [Trans10K, TransLab] [CCNet] 2021 [PVT, PVTv1] [SETR] [Trans10K-v2, Trans2Seg] 2022 [PVTv2]

==== My Other Previous Paper Readings ====



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.