Review — SK-Unet: An Improved U-Net Model With Selective Kernel for the Segmentation of LGE Cardiac MR Images
SK-Unet, Add SE-Res Module & SK Module to U-Net
SK-Unet: An Improved U-Net Model With Selective Kernel for the Segmentation of LGE Cardiac MR Images,
SK-Unet, by Sichuan University, Tencent AI Lab, Chinese University of Hong Kong, Chinese Academy of Medical Sciences, and Peking Union Medical College
2021 IEEE Sensors Journal, Over 10 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation, U-NetBiomedical Image Segmentation
2015 … 2021 [Expanded U-Net] [3-D RU-Net] [nnU-Net] [TransUNet] [CoTr] [TransBTS] [Swin-Unet] [Swin UNETR] [RCU-Net] [IBA-U-Net] [PRDNet] [Up-Net] 2022 [UNETR]
==== My Other Paper Readings Also Over Here ====
- SK-Unet is proposed, which augments the original U-Net model by adding a squeeze-and-excitation residual (SE-Res) module in the encoder and a selective kernel (SK) module in the decoder.
- The SE-Res module applies an attention mechanism to enhance informative feature extraction and suppress redundant ones.
- The SK module offers the ability to adaptively learn task-relevant multi-scale spatial features.
1. SK-Unet
1.1. Encoder
- Each resolution level uses a series of operations for feature transformation, which include three convolution layers with batch normalization (BN), one rectified linear unit (ReLU), and one SE-Res module. The kernel sizes of the three convolution layers are 1 × 1, 3 × 3, and 1 × 1 in sequence.
1.2. SE-Res Module
- (Please feel free to read SENet for more details.)
- The SE-Res module applies a squeeze operator that compresses global spatial information into a channel descriptor and an excitation operator that maps the input descriptor to a set of channel weights, as above.
By this means, the network can dynamically model the interdependencies between feature channels, thereby strengthening the representation power of the total network.
1.3. Decoder
- The decoder part of the SK-Unet consists of several upsampling stages to gradually reconstruct a segmentation map.
- The feature map is subsequently passed through a convolution layer, a BN layer, and a ReLU layer.
After that, a SK module is applied to adaptively learn feature representations with different receptive field sizes in order to better capture multi-scale information.
1.4. SK Module
- (Please feel free to read SKNet for more details.)
- First, the input feature map X is split into three branches, which perform dilated convolutions with three different kernel sizes to obtain three new feature maps U’, U’’, and U’’’.
- Then, an element-wise summation for U’, U’’, and U’’’ is computed to produce an integrated feature map U.
- Global channel-wise feature information S (C: the number of feature channels) is then obtained by a global average pooling operation on U.
- A fully connected layer is further applied to reduce the feature dimension to obtain an aggregated weight vector Z.
- The adaptive attention weight matrix Z’ by fully connected layer W:
- By reshaping Z, a new weight matrix with 3 rows (A’, B’, and C’) and C columns can be produced. Softmax attention is then used to transform the weigh matrix to probability weighting factors for each channel:
- The final weighed feature map is computed by:
- where:
1.5. Output
2. Results
2.1. Ablation Study
- When the original encoder is with the SE-Res module, the network accuracy was clearly improved.
- The SK module provides even larger performance improvements.
Adding both the SE-Res module in the network encoder and the SK module in the network decoder produces the best segmentation accuracy.
2.2. SOTA Comparison
The proposed SK-Unet method produced the best accuracy.
The segmentation results closely matched the ground truth.
2.3. MS-CMRSEG 2019 Challenges
The proposed method obtain the highest total score ranked first in the competition.