Review — SK-Unet: An Improved U-Net Model With Selective Kernel for the Segmentation of LGE Cardiac MR Images

SK-Unet, Add SE-Res Module & SK Module to U-Net

4 min readMar 24, 2023

SK-Unet: An Improved U-Net Model With Selective Kernel for the Segmentation of LGE Cardiac MR Images,
SK-Unet, by Sichuan University, Tencent AI Lab, Chinese University of Hong Kong, Chinese Academy of Medical Sciences, and Peking Union Medical College
2021 IEEE Sensors Journal, Over 10 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation, U-Net
Biomedical Image Segmentation
2015 … 2021 [Expanded U-Net] [3-D RU-Net] [nnU-Net] [TransUNet] [CoTr] [TransBTS] [Swin-Unet] [Swin UNETR] [RCU-Net] [IBA-U-Net] [PRDNet] [Up-Net] 2022 [UNETR]
==== My Other Paper Readings Also Over Here ====

SK-Unet is proposed, which augments the original U-Net model by adding a squeeze-and-excitation residual (SE-Res) module in the encoder and a selective kernel (SK) module in the decoder.
The SE-Res module applies an attention mechanism to enhance informative feature extraction and suppress redundant ones.
The SK module offers the ability to adaptively learn task-relevant multi-scale spatial features.

1. SK-Unet

1.1. Encoder

Each resolution level uses a series of operations for feature transformation, which include three convolution layers with batch normalization (BN), one rectified linear unit (ReLU), and one SE-Res module. The kernel sizes of the three convolution layers are 1 × 1, 3 × 3, and 1 × 1 in sequence.

1.2. SE-Res Module

(Please feel free to read SENet for more details.)
The SE-Res module applies a squeeze operator that compresses global spatial information into a channel descriptor and an excitation operator that maps the input descriptor to a set of channel weights, as above.

By this means, the network can dynamically model the interdependencies between feature channels, thereby strengthening the representation power of the total network.

1.3. Decoder

The decoder part of the SK-Unet consists of several upsampling stages to gradually reconstruct a segmentation map.
The feature map is subsequently passed through a convolution layer, a BN layer, and a ReLU layer.

After that, a SK module is applied to adaptively learn feature representations with different receptive field sizes in order to better capture multi-scale information.

1.4. SK Module

(Please feel free to read SKNet for more details.)
First, the input feature map X is split into three branches, which perform dilated convolutions with three different kernel sizes to obtain three new feature maps U’, U’’, and U’’’.
Then, an element-wise summation for U’, U’’, and U’’’ is computed to produce an integrated feature map U.
Global channel-wise feature information S (C: the number of feature channels) is then obtained by a global average pooling operation on U.
A fully connected layer is further applied to reduce the feature dimension to obtain an aggregated weight vector Z.
The adaptive attention weight matrix Z’ by fully connected layer W:

By reshaping Z, a new weight matrix with 3 rows (A’, B’, and C’) and C columns can be produced. Softmax attention is then used to transform the weigh matrix to probability weighting factors for each channel:

The final weighed feature map is computed by:

where:

1.5. Output

The aggregated feature map is then passed through a sequence of 1 × 1 convolution, BN, ReLU, and 1×1 convolution again to get the final segmentation map. Softmax is used at the end.
A total loss function is composed of a weighted multi-class cross-entropy loss and a multi-class Dice loss:

2. Results

2.1. Ablation Study

**Ablation Study on Cardiac Segmentation Task**

When the original encoder is with the SE-Res module, the network accuracy was clearly improved.
The SK module provides even larger performance improvements.