Review: Attention U-Net — Learning Where to Look for the Pancreas (Biomedical Image Segmentation)

With Attention Gate (AG), the model automatically focus to learn the target structures of varying shapes and sizes.

5 min readAug 27, 2019

In this story, Attention U-Net, by Imperial College London, Nagoya University & Aichi Cancer Center, University of Luebeck, HeartFlow, and Babylon Health, is briefly reviewed.

With Attention Gate (AG), the model automatically focus to learn the target structures of varying shapes and sizes.

This is published in 2018 MIDL with more than 40 citations. (Sik-Ho Tsang @ Medium)

Outline

Attention U-Net
Analysis
Experimental Results

1. Attention U-Net

**Top: Attention Gate (AG), Bottom: Attention U-Net**

1.1. Framework

As the same as U-Net or 3D U-Net, we got contraction path at the left and expansion path at the right.
Contraction path: a series of conv and max pooling to extract local features.
Expansion path: A series of upsampling and conv for global feature.
And there are concatenation of feature maps at the same level using skip connection.
But different from U-Net (same as 3D U-Net), 3D conv is used because the input image is 3D CT image.
Another difference is that at the skip connection, there is an Attention Gate (AG) at each level.

1.2. Attention Gate (AG)

The details of AG is as shown at the top of the above figure.
First, the two input feature maps are having individual 1×1×1 conv, then added together and ReLU.
Second, 1×1×1 conv is performed again, but with Sigmoid as activation function this time.
As sigmoid has the ranges of [0,1], it is just like a mask.
Unlike Residual Attention Network or SENet which are channel-wise or class-wise mask, it is a voxel-wise mask.
After sigmoid, it goes through the resampler, which is actually the trilinear interpolation, to make the feature map sizes the same as the one to be element-multiplied.
Finally, concatenation is performed with the upsampled feature maps at the lower level.

2. Analysis

From left to right (a-e, f-j): Axial and sagittal views of a 3D abdominal CT scan, attention coefficients, feature activations of a skip connection before and after gating.
Similarly, (k-n) visualise the gating on a coarse scale skip connection.
The filtered feature activations (d-e, i-j) are collected from multiple AGs, we can see that a subset of organs is selected by each gate.

**The attention coefficients across different training epochs (3,6, 10, 60, 150).**

As the above figure, we can see that the model gradually learns to focus on the pancreas, kidney, and spleen during training.

3. Experimental Results

3.1. Datasets

CT-150: 150 abdominal 3D CT scans acquired from patients diagnosed with gastric cancer (stomach cancer)
CT-82: 82 contrast enhanced 3D CT scans with pancreas manual annotations performed slice-by-slice, which is TCIA CT Pancreas benchmark (61 train, 21 test).

3.2. Comparison with U-Net

(120/30): 120 images for training, and 30 for testing.
(30/120): 30 images for training, and 120 for testing.
With the above two settings, Attention U-Net consistently outperforms U-Net with higher Dice Score Similarity (DSC) for different organs.
The inference time is just a bit longer compared with U-Net.

Since Attention U-Net has more parameters than U-Net without AGs, authors add more channels to the U-Net to make the number of parameters close to Attention U-Net, in which their results are shown above.
DSC is not as good as the one in Attention U-Net.
Also, the inference time is even longer as well.

3.3. Fine-Tuning or Training From Scratch

Initially, the models
trained on CT-150 dataset are directly applied to CT-82 dataset to observe the applicability of the
two models on different datasets
BFT:Before Fine Tuning, Attention U-Net outperforms U-Net.
AFT: After Fine Tuning, Attention U-Net still outperforms U-Net.
SCR: When training the models from scratch, Attention U-Net still outperforms U-Net.

3.4. Comparison with State-of-the-art Approaches

**Indirect Comparison with State-of-the-art Approaches**

With only few additional parameters, i.e. AGs,
With the use of single model,
Without any cascaded U-Nets within the model (many parameters),
Without any post-processing,
Attention U-Net got 81.48 ± 6.23 on CT-82, which has better or comparative results with other SOTA approaches.

3.5. Visualization

(a): Ground-truth pancreas segmentation is highlighted in blue.
(b): Ground-truth pancreas segmentation
(c): U-Net model prediction. The missed dense predictions by U-Net are highlighted with red arrows.
(d): Attention U-Net prediction.

Reference

[2018 MIDL] [Attention U-Net]
Attention U-Net: Learning Where to Look for the Pancreas

My Previous Reviews

Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]

Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]

Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN]

Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet] [Cascaded 3D U-Net] [Attention U-Net]

Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]

Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet]

Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM]

Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17] [VRCNN] [DCAD] [DS-CNN]

Generative Adversarial Network [GAN]