Review — Up-Net: Towards Better Semantic Consistency of 2D Medical Image Segmentation
Towards Better Semantic Consistency of 2D Medical Image Segmentation,
Up-Net, by University of Electronic Science and Technology of China, and King’s College London
2021 Elsevier J. VCIR (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation, U-Net4.2. Biomedical Image Segmentation
2015 … 2021 [Expanded U-Net] [3-D RU-Net] [nnU-Net] [TransUNet] [CoTr] [TransBTS] [Swin-Unet] [Swin UNETR] [RCU-Net] [IBA-U-Net] [PRDNet] 2022 [UNETR]
==== My Other Paper Readings Also Over Here ====
- A novel attentional up-concatenation structure to build an auxiliary path for direct access to multi-level features.
- In addition, a new structural loss is employed to bring better morphological awareness and reduce the segmentation flaws caused by the semantic inconsistencies.
Outline
- Up-Net
- Structure Loss
- Results
1. Up-Net
- U-Net encoder-decoder structure is used with MobileNetV2 as backbone, which is ImageNet pre-trained.
- An additional up-connection path, namely up-concatenation, to bridge the high-level semantics with low-level details within the decoding path of U-Net.
- At the end, by concatenating and convolving the multi-level features using the attentional up-concatenation, gradients can also be backpropagated to all levels of decoders.
- (Please feel free to read SENet for more details.)
- Different versions of proposed network are named as Up-Net (N1) to (N4). The Up-Net (N4) uses the entire four levels of features at the end, while Up-Net (N1) uses only the last level.
2. Structure Loss
- The total loss has two major terms:
- which are Mixed Dice Loss and Structure Loss.
2.1. Mixed Dice Loss
- Cross-entropy loss is commonly used for such a pixel-wise classification task by optimizing both foreground and background pixels:
- The imbalance in the number of foreground and background pixels can introduce bias into the model. Therefore, the Dice coefficient loss is used to solve this problem, defined as
- LDice may sometimes fail in convergence. So, LDice is mixed with minor LCE to avoid such problem similar in [41], namely mixed dice loss, defined as:
- 𝜆CE and 𝜆Dice are empirically set as 0.01 and 1.0.
2.2. Structure Loss
- The edge-aware loss encourages the model to discover distinguished differences between neighboring pixels, defined as:
- While the ground truth labels of central pixel 𝑖 and its neighbor pixels 𝑗 are belong to different classes (𝐶𝑖,𝑗 = 1), the LEdge will urge them to have contrary predictions.
- Different from LEdge, the connection-aware loss encourages the network to discover homogeneous similarities between neighboring regions, defined as:
- The structural loss is defined as the sum of 𝐿𝐶𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛 and 𝐿𝐸𝑑𝑔𝑒:
2. Results
2.1. Optic Disc/Cup Segmentation
From the comparison, the proposed Up-Net performs better than the state-of-the-art OC/OD segmentation methods in all four datasets.
2.2. Cellular Segmentation
Up-Net (N4) outperforms the state-of-the-art methods in all aspects of F1-score, Dice, and accuracy.
2.3. Lung Segmentation
Up-Net obtains better semantics consistency and successfully avoids the overfilled flaw compared to the result of DeepLabv3+.
2.4. Visual Quality
Up-Net (N4) obtains the most accurate segmentation.
(There are still other experimental results, .e.g.: ablation study, please free feel to read the paper directly.)