Review: Cascaded 3D U-Net — Multi-Organ Segmentation (Biomedical Image Segmentation)
3D Convolutions for Simultaneous Artery, Portal Vein, Liver, Spleen, Stomach, Gallbladder, and Pancreas Segmentation
In this story, Cascaded 3D U-Net, by Nagoya University, Nagoya University Graduate School of Medicine, and Aichi Cancer Center, is briefly reviewed.
- In the first stage, an FCN model is trained to roughly delineate the organs of interest.
- In the second stage, an FCN model is trained to have more detailed segmentation of the organs.
A test set acquired at a different hospital and scanner are tried with SOTA performance. This is firstly published as 2017 arXiv tech report with more than 40 citations. Then it is modified and published in 2018 JCMIG with 30 citations. (Sik-Ho Tsang @ Medium)
Outline
- Coarse-to-Fine Cascaded U-Net
- Loss Function
- Training & Validation & Testing
- Comparison with State-of-the-art Approaches
1. Coarse-to-Fine Cascaded U-Net
- The first-stage FCN sees around 40% of the voxels as C1 Candidate Region using only a simple mask of the body created by thresholding the image.
- C2 Candidate Region is output from the first-stage FCN.
- In second-stage FCN, the amount of the image’s voxels is reduced by around 10%.
- This step narrows down and simplifies the search space for the FCN.
- In this paper, the 2-stage FCNs used are 3D U-Net.
- (For details about 3D U-Net, Please feel free to read my review about it.)
- At the end of the network, the last layer contains a 1×1×1 convolution that reduces the number of output channels to the number of class labels (K=8, 7 organs plus 1 background) and a size of 44×44×28 of each channel.
2. Loss Function
- Ni are the number of voxels within one class in Ln.
- NC is the number of voxels within candidate region C1 or C2.
- λi is the value based on Ni and NC, i.e. based on the occurrence frequency of the class. And summing all λi equals to 1.
- Simply speaking, the loss is the weighted voxel-wise cross-entropy loss.
3. Training & Validation & Testing
3.1 Training & Validation
- Dataset: 331 contrast-enhanced abdominal clinical CT images.
- Each CT volume consists of 460 — 1177 slices of 512×512 pixels. The voxel dimensions are [0.59–0.98, 0.59–0.98, 0.5–1.0] mm.
- A random split of 281/50 patients is used for training and validating the network.
- Data Augmentation: Smooth B-spline deformations, random rotations.
- 200,000 iterations in the first stage and 115,000 in the second.
- With 2 stages, dice score is improved from 71.7% to 79.2%.
- On average, a 7.5% improvement in Dice scores per organ is achieved.
- The segmentation is quite good as shown above.
3.2 Testing
- Dataset: It originates from a different hospital, scanner, and research study with gastric cancer patients. 150 abdominal CT scans were acquired.
- Each CT volume consists of 263 — 1061 slices of 512×512 pixels. Voxel dimensions are [0.55–0.82, 0.55–0.82, 0.4–0.80] mm.
- The pancreas, liver, and spleen were semi-automatically delineated by three trained researchers and confirmed by a clinician.
- The dice similarity score is improved at stage 2.
- With overlapping, the score is even higher.
3.3. Computation
- Training on 281 cases can take 2–3 days for 200-k iterations on a NVIDIA GeForce GTX TITAN X with 12 GB memory.
- On 150 cases of the test set, the processing time for each volume was 1.4–3.3 minutes for each stage, depending on the size of the candidate regions.
4. Comparison with State-of-the-art Approaches
- Without any post-processing, the proposed approach outperforms others such as random forest (RF) approach or other FCN approaches.
Reference
[2017 arXiv] [Cascaded 3D U-Net]
Hierarchical 3D fully convolutional networks for multi-organ segmentation
[2018 JCMIG] [Cascaded 3D U-Net]
An application of cascaded 3D fully convolutional networks for medical image segmentation
My Previous Reviews
Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]
Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]
Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN]
Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet] [Cascaded 3D U-Net]
Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]
Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet]
Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM]
Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17] [VRCNN] [DCAD] [DS-CNN]