Brief Review — RefineU-Net: Segmentation with Attention
RefineU-Net: Improved U-Net with Progressive Global Feedbacks and Residual Attention Guided Local Refinement for Medical Image Segmentation,
RefineU-Net, by A ∗STAR,
2020 J. Pattern Recognition Letters, Over 20 Citations (Sik-Ho Tsang @ Medium)
Medical Image Analysis, Medical Imaging, Image Segmentation, U-Net
==== My Other Paper Readings Are Also Over Here ====
- RefineU-Net is proposed, which consists of three modules: encoding module (EM), global refine- ment module (GRM) and local refinement module (LRM).
- EM is backboned by pretrained VGG-16 using ImageNet.
- GRM progressively upsamples the top side output of EM and fuses the resulted upsampled features with the side outputs of EM at each resolution level.
- LRM uses residual attention gate (RAG) to generate discriminative attentive features to be concatenated with the decoded features in the expansive path of U-Net.
1.1. Encoding Module (EM) (Blue)
- ImageNet-pretrained VGG-16 is used as backbone, i.e. EM.
1.2. Global Refinement Module (GRM) (Yellow)
- GRM upsamples the feature by a factor 2 using a 4-by-4 transposed convolution layer and then fuses the resulted upsampled feature with the side output from the previous adjunct block in EM using L2 normalization fusion approach:
- where Gl represents the fused feature corresponding to the l-th side output of EM. fl represents the transposed convolution layer followed by the ReLU activation function. gl normalizes an input x as:
- where γl is a learnable scalar and ||.||2 denotes L2 norm of a vector.
GRM generates the multi-level fused outputs Gl which effectively summarizes the global context and semantic information.
1.3. Local Refinement Module (LRM) (Green)
- Gl and Dl+1 (l=1, 2, 3, 4), first go through two 1-by-1 convolution layers.
- Secondly, the decoded features are bilinearly upsampled to be the same size resolution as the output features of GRM.
- Then, the resulted features from two input paths are L2 normalization fused using the function gl.
- Another 3-by-3 convolution layer is used to make the fused feature be the same resolution and depth as Gl.
- The output feature then goes though an element-wise sigmoid function operation to generate an attention map and element-wisely multiply Gl with the attention map to generate the residual attentive signal and add it to Gl to produce the final attentive features. (This is the concept from Residual Attention Network.)
- The final attentive feature from RAG is concatenated with the decoded features.
- At the end of LRM, a 1-by-1 convolution layer and sigmoid are appended to produce the binary segmentation map.
The local refinement module (LRM) modifies the original decoding path of U-Net by incorporating attention mechanism to enhance local refinement.
1.4. Loss Function
- Two losses are exploited, namely binary cross-entropy loss Lbce and intersection over union loss Liou.
- The total loss L is:
- Two polyp segmentation datasets from MICCAI 2015 polyp detection challenge [22,23] and two skin lesion segmentation datasets from ISBI2016 and ISBI2017, are used.
2.2. Ablation Study
The one using both GRM and LRM is the best.
Using RAG is better than Attention Gate (AG) in Attention U-Net.
2.3. SOTA Comparisons
- RefineU-Net produces the most similar segmentation results to the groundtruth.
To summarize, both quantitatively and qualitatively, the proposed RefineU-Net generally outperforms the competing state-of-the-art U-Net methods.
The residual attention gate (RAG) contributes to the improvement of segmentation performance through local refinement.
RefineU-Net progressively refines the ROI segmentation by alternatively paying attention to the predicted ROI and the background regions.
[2020 J. Pattern Recognition Letters] [RefineU-Net]
RefineU-Net: Improved U-Net with Progressive Global Feedbacks and Residual Attention Guided Local Refinement for Medical Image Segmentation