Review — SK-Unet: An Improved U-Net Model With Selective Kernel for the Segmentation of LGE Cardiac MR Images

SK-Unet, Add SE-Res Module & SK Module to U-Net

Sik-Ho Tsang
4 min readMar 24, 2023

SK-Unet: An Improved U-Net Model With Selective Kernel for the Segmentation of LGE Cardiac MR Images,
SK-Unet, by Sichuan University, Tencent AI Lab, Chinese University of Hong Kong, Chinese Academy of Medical Sciences, and Peking Union Medical College
2021 IEEE Sensors Journal, Over 10 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation, U-Net

Biomedical Image Segmentation
2015 … 2021
[Expanded U-Net] [3-D RU-Net] [nnU-Net] [TransUNet] [CoTr] [TransBTS] [Swin-Unet] [Swin UNETR] [RCU-Net] [IBA-U-Net] [PRDNet] [Up-Net] 2022 [UNETR]
==== My Other Paper Readings Also Over Here ====

  • SK-Unet is proposed, which augments the original U-Net model by adding a squeeze-and-excitation residual (SE-Res) module in the encoder and a selective kernel (SK) module in the decoder.
  • The SE-Res module applies an attention mechanism to enhance informative feature extraction and suppress redundant ones.
  • The SK module offers the ability to adaptively learn task-relevant multi-scale spatial features.

1. SK-Unet

SK-Unet Model Architecture

1.1. Encoder

  • Each resolution level uses a series of operations for feature transformation, which include three convolution layers with batch normalization (BN), one rectified linear unit (ReLU), and one SE-Res module. The kernel sizes of the three convolution layers are 1 × 1, 3 × 3, and 1 × 1 in sequence.

1.2. SE-Res Module

SE-Res Module
  • (Please feel free to read SENet for more details.)
  • The SE-Res module applies a squeeze operator that compresses global spatial information into a channel descriptor and an excitation operator that maps the input descriptor to a set of channel weights, as above.

By this means, the network can dynamically model the interdependencies between feature channels, thereby strengthening the representation power of the total network.

1.3. Decoder

  • The decoder part of the SK-Unet consists of several upsampling stages to gradually reconstruct a segmentation map.
  • The feature map is subsequently passed through a convolution layer, a BN layer, and a ReLU layer.

After that, a SK module is applied to adaptively learn feature representations with different receptive field sizes in order to better capture multi-scale information.

1.4. SK Module

SK Module
  • (Please feel free to read SKNet for more details.)
  • First, the input feature map X is split into three branches, which perform dilated convolutions with three different kernel sizes to obtain three new feature maps U’, U’’, and U’’’.
  • Then, an element-wise summation for U’, U’’, and U’’’ is computed to produce an integrated feature map U.
  • Global channel-wise feature information S (C: the number of feature channels) is then obtained by a global average pooling operation on U.
  • A fully connected layer is further applied to reduce the feature dimension to obtain an aggregated weight vector Z.
  • The adaptive attention weight matrix Z’ by fully connected layer W:
  • By reshaping Z, a new weight matrix with 3 rows (A’, B’, and C’) and C columns can be produced. Softmax attention is then used to transform the weigh matrix to probability weighting factors for each channel:
  • The final weighed feature map is computed by:
  • where:

1.5. Output

  • The aggregated feature map is then passed through a sequence of 1 × 1 convolution, BN, ReLU, and 1×1 convolution again to get the final segmentation map. Softmax is used at the end.
  • A total loss function is composed of a weighted multi-class cross-entropy loss and a multi-class Dice loss:

2. Results

2.1. Ablation Study

Ablation Study on Cardiac Segmentation Task
  • When the original encoder is with the SE-Res module, the network accuracy was clearly improved.
  • The SK module provides even larger performance improvements.

Adding both the SE-Res module in the network encoder and the SK module in the network decoder produces the best segmentation accuracy.

2.2. SOTA Comparison

SOTA Comparison on Cardiac Segmentation Task

The proposed SK-Unet method produced the best accuracy.

Visualization of representative segmentation results from different neural network methods. The red, white, and blue regions denote the LV, LVM, and RV, respectively.

The segmentation results closely matched the ground truth.

2.3. MS-CMRSEG 2019 Challenges

Comparison with Top 10 Teams in the MS-CMRSEG 2019 Challenges

The proposed method obtain the highest total score ranked first in the competition.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.