Review — Up-Net: Towards Better Semantic Consistency of 2D Medical Image Segmentation

Up-Net, Adding SE Module Concept, Originated in SENet, to U-Net

4 min readMar 19, 2023

--

Towards Better Semantic Consistency of 2D Medical Image Segmentation,
Up-Net, by University of Electronic Science and Technology of China, and King’s College London
2021 Elsevier J. VCIR (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation, U-Net
4.2. Biomedical Image Segmentation
2015 … 2021 [Expanded U-Net] [3-D RU-Net] [nnU-Net] [TransUNet] [CoTr] [TransBTS] [Swin-Unet] [Swin UNETR] [RCU-Net] [IBA-U-Net] [PRDNet] 2022 [UNETR]
==== My Other Paper Readings Also Over Here ====

A novel attentional up-concatenation structure to build an auxiliary path for direct access to multi-level features.
In addition, a new structural loss is employed to bring better morphological awareness and reduce the segmentation flaws caused by the semantic inconsistencies.

Outline

Up-Net
Structure Loss
Results

1. Up-Net

U-Net encoder-decoder structure is used with MobileNetV2 as backbone, which is ImageNet pre-trained.
An additional up-connection path, namely up-concatenation, to bridge the high-level semantics with low-level details within the decoding path of U-Net.

**The attention block for attentional up-concatenation.**

At the end, by concatenating and convolving the multi-level features using the attentional up-concatenation, gradients can also be backpropagated to all levels of decoders.
(Please feel free to read SENet for more details.)
Different versions of proposed network are named as Up-Net (N1) to (N4). The Up-Net (N4) uses the entire four levels of features at the end, while Up-Net (N1) uses only the last level.

2. Structure Loss

The total loss has two major terms:

which are Mixed Dice Loss and Structure Loss.

2.1. Mixed Dice Loss

Cross-entropy loss is commonly used for such a pixel-wise classification task by optimizing both foreground and background pixels:

The imbalance in the number of foreground and background pixels can introduce bias into the model. Therefore, the Dice coefficient loss is used to solve this problem, defined as

LDice may sometimes fail in convergence. So, LDice is mixed with minor LCE to avoid such problem similar in [41], namely mixed dice loss, defined as:

𝜆CE and 𝜆Dice are empirically set as 0.01 and 1.0.

2.2. Structure Loss

The edge-aware loss encourages the model to discover distinguished differences between neighboring pixels, defined as:

While the ground truth labels of central pixel 𝑖 and its neighbor pixels 𝑗 are belong to different classes (𝐶𝑖,𝑗 = 1), the LEdge will urge them to have contrary predictions.
Different from LEdge, the connection-aware loss encourages the network to discover homogeneous similarities between neighboring regions, defined as: