Review: Cascaded 3D U-Net — Multi-Organ Segmentation (Biomedical Image Segmentation)

3D Convolutions for Simultaneous Artery, Portal Vein, Liver, Spleen, Stomach, Gallbladder, and Pancreas Segmentation

5 min readAug 25, 2019

In this story, Cascaded 3D U-Net, by Nagoya University, Nagoya University Graduate School of Medicine, and Aichi Cancer Center, is briefly reviewed.

In the first stage, an FCN model is trained to roughly delineate the organs of interest.
In the second stage, an FCN model is trained to have more detailed segmentation of the organs.

A test set acquired at a different hospital and scanner are tried with SOTA performance. This is firstly published as 2017 arXiv tech report with more than 40 citations. Then it is modified and published in 2018 JCMIG with 30 citations. (Sik-Ho Tsang @ Medium)

Outline

Coarse-to-Fine Cascaded U-Net
Loss Function
Training & Validation & Testing
Comparison with State-of-the-art Approaches

1. Coarse-to-Fine Cascaded U-Net

**Multi-stage cascaded training scheme**

**C1 Candidate Region** (Red) input to Stage 1 3D U-Net (Left), **C2 Candidate Region** (Red) input to Stage 2 3D U-Net (Right)

The first-stage FCN sees around 40% of the voxels as C1 Candidate Region using only a simple mask of the body created by thresholding the image.
C2 Candidate Region is output from the first-stage FCN.
In second-stage FCN, the amount of the image’s voxels is reduced by around 10%.
This step narrows down and simplifies the search space for the FCN.

In this paper, the 2-stage FCNs used are 3D U-Net.
(For details about 3D U-Net, Please feel free to read my review about it.)
At the end of the network, the last layer contains a 1×1×1 convolution that reduces the number of output channels to the number of class labels (K=8, 7 organs plus 1 background) and a size of 44×44×28 of each channel.

2. Loss Function

Ni are the number of voxels within one class in Ln.
NC is the number of voxels within candidate region C1 or C2.
λi is the value based on Ni and NC, i.e. based on the occurrence frequency of the class. And summing all λi equals to 1.
Simply speaking, the loss is the weighted voxel-wise cross-entropy loss.

3. Training & Validation & Testing

3.1 Training & Validation

Dataset: 331 contrast-enhanced abdominal clinical CT images.
Each CT volume consists of 460 — 1177 slices of 512×512 pixels. The voxel dimensions are [0.59–0.98, 0.59–0.98, 0.5–1.0] mm.
A random split of 281/50 patients is used for training and validating the network.

**Various examples of plausible random deformation**

Data Augmentation: Smooth B-spline deformations, random rotations.
200,000 iterations in the first stage and 115,000 in the second.

**Dics Similiary Score on Validation Set**

With 2 stages, dice score is improved from 71.7% to 79.2%.
On average, a 7.5% improvement in Dice scores per organ is achieved.

The segmentation is quite good as shown above.

3.2 Testing

Dataset: It originates from a different hospital, scanner, and research study with gastric cancer patients. 150 abdominal CT scans were acquired.
Each CT volume consists of 263 — 1061 slices of 512×512 pixels. Voxel dimensions are [0.55–0.82, 0.55–0.82, 0.4–0.80] mm.
The pancreas, liver, and spleen were semi-automatically delineated by three trained researchers and confirmed by a clinician.

The dice similarity score is improved at stage 2.
With overlapping, the score is even higher.

3.3. Computation

Training on 281 cases can take 2–3 days for 200-k iterations on a NVIDIA GeForce GTX TITAN X with 12 GB memory.
On 150 cases of the test set, the processing time for each volume was 1.4–3.3 minutes for each stage, depending on the size of the candidate regions.

4. Comparison with State-of-the-art Approaches

Without any post-processing, the proposed approach outperforms others such as random forest (RF) approach or other FCN approaches.

**Multi-organ segmentation result. Each color represents an organ region on the unseen whole torso test set**

Reference

[2017 arXiv] [Cascaded 3D U-Net]
Hierarchical 3D fully convolutional networks for multi-organ segmentation

[2018 JCMIG] [Cascaded 3D U-Net]
An application of cascaded 3D fully convolutional networks for medical image segmentation

My Previous Reviews

Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]

Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]

Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN]

Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet] [Cascaded 3D U-Net]

Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]

Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet]

Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM]

Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17] [VRCNN] [DCAD] [DS-CNN]