Review: Attention U-Net — Learning Where to Look for the Pancreas (Biomedical Image Segmentation)

With Attention Gate (AG), the model automatically focus to learn the target structures of varying shapes and sizes.

Sik-Ho Tsang
5 min readAug 27, 2019

In this story, Attention U-Net, by Imperial College London, Nagoya University & Aichi Cancer Center, University of Luebeck, HeartFlow, and Babylon Health, is briefly reviewed.

  • With Attention Gate (AG), the model automatically focus to learn the target structures of varying shapes and sizes.

This is published in 2018 MIDL with more than 40 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Attention U-Net
  2. Analysis
  3. Experimental Results

1. Attention U-Net

Top: Attention Gate (AG), Bottom: Attention U-Net

1.1. Framework

  • As the same as U-Net or 3D U-Net, we got contraction path at the left and expansion path at the right.
  • Contraction path: a series of conv and max pooling to extract local features.
  • Expansion path: A series of upsampling and conv for global feature.
  • And there are concatenation of feature maps at the same level using skip connection.
  • But different from U-Net (same as 3D U-Net), 3D conv is used because the input image is 3D CT image.
  • Another difference is that at the skip connection, there is an Attention Gate (AG) at each level.

1.2. Attention Gate (AG)

  • The details of AG is as shown at the top of the above figure.
  • First, the two input feature maps are having individual 1×1×1 conv, then added together and ReLU.
  • Second, 1×1×1 conv is performed again, but with Sigmoid as activation function this time.
  • As sigmoid has the ranges of [0,1], it is just like a mask.
  • Unlike Residual Attention Network or SENet which are channel-wise or class-wise mask, it is a voxel-wise mask.
  • After sigmoid, it goes through the resampler, which is actually the trilinear interpolation, to make the feature map sizes the same as the one to be element-multiplied.
  • Finally, concatenation is performed with the upsampled feature maps at the lower level.

2. Analysis

Analysis
  • From left to right (a-e, f-j): Axial and sagittal views of a 3D abdominal CT scan, attention coefficients, feature activations of a skip connection before and after gating.
  • Similarly, (k-n) visualise the gating on a coarse scale skip connection.
  • The filtered feature activations (d-e, i-j) are collected from multiple AGs, we can see that a subset of organs is selected by each gate.
The attention coefficients across different training epochs (3,6, 10, 60, 150).
  • As the above figure, we can see that the model gradually learns to focus on the pancreas, kidney, and spleen during training.

3. Experimental Results

3.1. Datasets

  • CT-150: 150 abdominal 3D CT scans acquired from patients diagnosed with gastric cancer (stomach cancer)
  • CT-82: 82 contrast enhanced 3D CT scans with pancreas manual annotations performed slice-by-slice, which is TCIA CT Pancreas benchmark (61 train, 21 test).

3.2. Comparison with U-Net

Comparison with U-Net on CT-150
  • (120/30): 120 images for training, and 30 for testing.
  • (30/120): 30 images for training, and 120 for testing.
  • With the above two settings, Attention U-Net consistently outperforms U-Net with higher Dice Score Similarity (DSC) for different organs.
  • The inference time is just a bit longer compared with U-Net.
  • Since Attention U-Net has more parameters than U-Net without AGs, authors add more channels to the U-Net to make the number of parameters close to Attention U-Net, in which their results are shown above.
  • DSC is not as good as the one in Attention U-Net.
  • Also, the inference time is even longer as well.

3.3. Fine-Tuning or Training From Scratch

TCIA CT Pancreas benchmark dataset
  • Initially, the models
  • trained on CT-150 dataset are directly applied to CT-82 dataset to observe the applicability of the
  • two models on different datasets
  • BFT:Before Fine Tuning, Attention U-Net outperforms U-Net.
  • AFT: After Fine Tuning, Attention U-Net still outperforms U-Net.
  • SCR: When training the models from scratch, Attention U-Net still outperforms U-Net.

3.4. Comparison with State-of-the-art Approaches

Indirect Comparison with State-of-the-art Approaches
  • With only few additional parameters, i.e. AGs,
  • With the use of single model,
  • Without any cascaded U-Nets within the model (many parameters),
  • Without any post-processing,
  • Attention U-Net got 81.48 ± 6.23 on CT-82, which has better or comparative results with other SOTA approaches.

3.5. Visualization

Visualization
  • (a): Ground-truth pancreas segmentation is highlighted in blue.
  • (b): Ground-truth pancreas segmentation
  • (c): U-Net model prediction. The missed dense predictions by U-Net are highlighted with red arrows.
  • (d): Attention U-Net prediction.

Reference

[2018 MIDL] [Attention U-Net]
Attention U-Net: Learning Where to Look for the Pancreas

My Previous Reviews

Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]

Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]

Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN]

Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet] [Cascaded 3D U-Net] [Attention U-Net]

Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]

Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet]

Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM]

Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17] [VRCNN] [DCAD] [DS-CNN]

Generative Adversarial Network [GAN]

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.