Brief Review — RA-UNet: A Hybrid Deep Attention-Aware Network to Extract Liver and Tumor in CT Scans

RA-UNet, first Work To Use Attention Residual Mechanism for Tumor Segmentation from 3D Medical Volumetric Images.

Sik-Ho Tsang
6 min readJan 31, 2023
Examples of typical 2D CT scans and the corresponding ground truth of liver/tumor extractions where red arrows indicate the tumor/lesion regions.

RA-UNet: A Hybrid Deep Attention-Aware Network to Extract Liver and Tumor in CT Scans,
RA-UNet, by Tianjin University, CSIRO Data61, Tianjin University of Traditional Chinese Medicine, and La Trobe University
2020 J. Front. Bioeng. Biotechnol., Over 220 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation, U-Net

  • A 3D hybrid residual attention-aware segmentation method, RA-UNet, is proposed where attention residual modules are integrated into U-Net so that the attention-aware features change adaptively.
  • This is the first work that an attention residual mechanism is used to segment tumors from 3D medical volumetric images.

Outline

  1. Residual Attention-aware U-Net (RA-UNet)
  2. Pipeline Details
  3. Results

1. Residual Attention-aware U-Net (RA-UNet)

Overview of the proposed pipeline of liver and tumor segmentation.

1.1. Overall Pipeline

  • The pipeline has three steps:
  1. RA-UNet-I: A 2D residual attention-aware U-Net (RA-UNet), named RA-UNet-I, is to obtain a coarse liver boundary box first.
  2. The First RA-UNet-II: Next, a 3D RA-UNet, which is called RA-UNet-II, was trained to obtain a precise liver volume of interest (VOI).
  3. The second RA-UNet-II: Finally, the obtained liver VOI was sent to a second RA-UNet-II to extract the tumor region.

1.2. Datasets and Materials

  • The public Liver Tumor Segmentation Challenge (LiTS) dataset is used, which has a total of 200 CT scans containing 130 scans as training data and 70 scans as test data.
  • Another dataset named 3DIRCADb is used as an external test dataset, which includes 20 enhanced CT scans.
  • both of which have the same 512×512 in-plane resolution but with different numbers of axial slices in each scan.

1.3. RA-UNet Model Architecture

Sample of a residual block in the dashed window.
  • In traditional residual block:
  • where x denotes the first input of a residual block, OR denotes the output of a residual block.
  • The residual block consists of three sets of combinations of a batch normalization (BN) layer, an activation (ReLU) layer, and a convolutional layer, as above.
The architecture of the attention residual module. (A) The attention residual module (B) The soft mask branch contains a stack of encoder-decoder blocks.

In this paper, attention residual learning proposed by Residual Attention Network is used, as above.

  • The attention residual mechanism divides the attention module into a trunk branch and a soft mask branch, where the trunk branch is used to process the original features and the soft-mask branch is used to construct the identity mapping.
  • The output OA of the attention module under attention residual learning can be formulated as:

In brief, the output soft-mask branch S is sigmoided, which is in the range of [0,1]. Therefore, for the correlated features, S will be close to 1. For uncorrelated features, S will be close to 0.

By multiplcations with F, features F will be magnified by S that are close to 1, and will be diminished by S that are close to 0.

  • 1+ is to given the skip connection.
  • Therefore, this mechanism enhances good features and reduce the noises from the trunk branch.
  • (Please feel free to read Residual Attention Network if interested.)
Architecture of the proposed RA-UNET-II in liver localization stage.
  • The overall architecture of RA-UNET-II is shown as above.
  • Sigmoid is used at the output to generate the final probability map of liver segmentation.

1.4. Loss Function

  • Standard Dice loss is used:

2. Pipeline Details

2.1. Liver Localization Using RA-UNet-I

Liver localization using RA-UNet-I. (A) A typical slice from the LiTS validation dataset. (B) A typical slice from the 3DIRCADb dataset.
  • The first stage aimed to locate the 3D liver boundary box. A 2D version RA-UNet-I was introduced here to segment a coarse liver region, which can reduce the computational cost of the subsequent RA-UNet-II.
  • The slices are downsampled to 256×256 and fed into the trained RA-UNet-I. All the slices are stacked in their original sequence.
  • Afterwards, a 3D connected-component labeling is used for assigning a unique label to each connected component in an image.
  • Finally, the liver region is interpolated to its original volume size with a 512×512 size.

The attention mechanism has successfully constrained the liver region.

2.2. Liver Segmentation Using RA-UNet-II

Liver segmentation results based on RA-UNet-II. (A) From the LiTS validation dataset and (B) is from the 3DIRCADb dataset.
  • The RA-UNet-II is employed on each CT patch to generate 3D liver probability patches in sequence. Then, those probability patches are interpolated and stacked to be restored to the original size of the boundary box.
  • A voting strategy is used to generate the final liver probability of the VOI from overlapped sub-patches.
  • A 3D connected-component labeling is used and the largest component was chosen on the merged VOI to yield the final liver region.

The liver region was precisely extracted by selecting the largest region.

2.3. Extraction of Tumors Based on RA-UNet-II

Tumor segmentation results based on RA-UNet-II. (A) From the LiTS validation dataset, and (B) is from the 3DIRCADb dataset.
  • Tumor region extraction is similar to liver segmentation but no interpolation and resizing were performed.
Tumor patch extraction results.
  • In order to solve the data imbalance issue and learn more effective tumor features, patches on both tumor and its surroundings non-tumor regions are picked for training.
  • A voting strategy is used again on the merged VOI to yield the final tumor segmentation. At last, we filtered out those voxels which were not in the liver region.

3. Results

3.1. Ablation on Loss Functions

Left: Evaluation results of the liver segmentation on the LiTS test dataset and the 3DIRCADb dataset. Right: Scores of the tumor segmentation on the LiTS test dataset and the 3DIRCADb dataset.
  • DC is Dice Coefficient Score.

Liver (Left): DC reached up to 0.961 and 0.977 Dice scores on the LiTS test dataset and the 3DIRCADb dataset, respectively.

Tumor (Right): DC reached 0.595 and 0.830 Dice scores on the LiTS test dataset and the 3DIRCADb dataset, respectively.

3.2. Qualitative Results

Automatic liver and tumor segmentation with RA-UNet. (A) From the LiTS dataset. (B) From the 3DIRCADb dataset.

It shows that liver regions which are large in size are successfully segmented and tumors that are tiny and hard to detect can be identified by the proposed method as well.

Due to the low contrast with the surrounding livers and the extremely small size of some tumors, the proposed method still has some false positives and false negatives for tumor extraction.

3.2. Quantitative Results

Segmentation results compared with other methods on the LiTS test dataset.

The proposed method obtains precise segmentation of liver and tumor, outperforms two SOTA approaches.

3.3. Generalization of the Proposed RA-UNet

Segmentation results compared with other methods on the 3DIRCADb dataset.
  • To show the generalization of the proposed method, we used the weights well-trained on LiTS and tested on the 3DIRCADb dataset.

The proposed method reached a mean Dice score of 0.830 on livers with tumors compared to a mean Dice score of 0.56 for the method by Christ et al. (2017a).

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.