Review: VoxResNet — Deep Voxelwise Residual Networks for Volumetric Brain Segmentation (Biomedical Image Segmentation)

First place in the MICCAI MRBrainS challenge leaderboard out of 37 competitors

Sik-Ho Tsang
6 min readSep 30, 2019
First Row: Brain Images, Second Row: Segmentation annotations by experts

In this story, VoxResNet, a novel voxelwise residual network, by The Chinese University of Hong Kong (CUHK), The Hong Kong Polytechnic University (PolyU), and Chinese Academy of Sciences (Shenzhen), is reviewed. Segmentation of key brain tissues from 3D medical images is of great significance for brain disease diagnosis, progression assessment and monitoring of neurologic conditions.

  • It is built with 25 layers, and hence can generate more representative features to deal with the large variations of brain tissues.
  • Multi-modality and multi-level contextual information are integrated into the network, so that the complementary information of different modalities can be harnessed and features of different scales can be exploited.
  • The segmentation performance is further improved by combining the low-level image appearance features, implicit shape information, and high-level context together.

It is firstly appeared as 2016 arXiv tech report with more than 70 citations, and then appeared in 2018 JNeuroImage (Impact Factor 5.812) with more than 170 citations. (Sik-Ho Tsang @ Medium)

It also achieved the first place in the challenge out of 37 competitors in well-known benchmark, MRBrainS.

Outline

  1. VoxResNet Architecture
  2. Multi-Modality Inputs
  3. Auto-Context VoxResNet
  4. Ablation Study
  5. Comparison with other methods

1. VoxResNet Architecture

(a) VoxResNet Architecture, (b) VoxRes module

1.1. VoxRes Module

  • Generally, a residual unit in ResNet can be expressed as following:
  • Fl denotes the residual function, i.e., a stack of two convolutional layers with batch normalization (BN).
  • By unfolding the above equation recursively:
  • The feature xL of any deeper layers can be represented as the feature xl of shallow unit l plus summarized residual functions.
  • During backpropagation, a chain rule is applied:
  • This shows that the residual unit mechanism can make information propagate through the entire network smoothly in both forward and backward passes.

1.2. VoxResNet

  • VoxResNet architecture consists of stacked residual modules (i.e., VoxRes module) with a total of 25 volumetric convolutional/deconvolutional layers.
  • Small convolutional kernels (i.e.,1 × 3 × 3 or 3 × 3 × 3) are employed in the convolutional layers, which have demonstrated evident advantages on computation efficiency and representation capability.
  • In order to handle the large variation of size, multi-level contextual information (i.e., 4 auxiliary classifiers C1-C4 in the above figure) is fused with deep supervision in the network.
  • The whole network is trained by minimizing following objective function with standard back-propagation:
  • First term: regularization term using L2 norm.
  • Latter terms: The fidelity term consisting of auxiliary classifiers and final target classifier.
  • This design is similar to FCN and CUMedVision1.
Multi-Modality Input & Auto-Context VoxResNet

2. Multi-Modality Inputs

  • The volumetric data is usually acquired with multiple imaging modalities for robustly examining different tissue structures.
  • In this paper, three imaging modalities including T1, T1-IR, and T2-FLAIR are available in the brain structure segmentation task.
  • The main reason for acquiring multi-modality images is that the information from multi-modality dataset can be complementary, which provides more robust diagnosis results.
  • Inspired by this clinical observation, these multi-modality data are concatenated as input channels into neural network.

3. Auto-Context VoxResNet

  • First, a VoxResNet classifier is trained on the original training sub-volumes with image appearance information.
  • Then, the discriminative probability maps generated from VoxResNet are used as the context information, together with the original volumes (i.e., appearance information) as input, to train a new classifier Auto-context VoxResNet, which further refines the semantic segmentation results and removes the outliers.

4. Ablation Study

4.1. Metrics

  • Dice coefficient (DC): The Dice coefficient measures the spatial overlap between the segmentation result and ground truth, with a larger value denoting a higher segmentation accuracy.
  • The 95th-percentile of the Hausdorff distance (HD): The Hausdorff distance measure the distance between the segmentation results and the ground truth. A smaller value of HD(G, S) denotes a higher proximity between ground truth and segmentation results.
  • Absolute volume difference (AVD): A smaller value of AVD(G, S) denotes a better segmentation accuracy.

4.2. Multi-Modality Inputs

Cross-validation results of MRI brain segmentation using different image modalities
  • When combining the multi-modality information from all available image modalities, the segmentation performance is obviously improved for almost all the evaluation metrics compared with that of any single image modality, especially on the metric of DC.
  • It is also observed in the table that by integrating the auto-context information, the performance of DC can be further improved.
The example results of validation data using different image modalities
  • (a)-(c): Original T1, T1-IR, and T2-FLAIR MR images.
  • (d): Ground-truth label.
  • (e)-(g): Corresponding segmentation result using single image modality.
  • (h): The result using all image modalities without auto-context information.
  • The results using all image modalities are visually more accurate than those of single image modality.
The qualitative results of brain segmentation with or without auto-context information
  • (a): Original T1 MR images.
  • (b): Results of VoxResNet.
  • (c): Results of Auto-context VoxResNet.
  • (d): ground truth labels.
  • The results by fusing auto-context information can generate more accurate results than the network without integrating it.

4.3. Multiple Classifiers Fusion

Results of MRI brain segmentation using different levels of contextual information
  • C1-C4: Using C1-C4 almost got the best performance in different parts.

5. Comparison with other methods

Results of MICCAI MRBrainS challenge of different methods
  • CU_DL: VoxResNet.
  • CU_DL2: Auto-Context VoxResNet.
  • Overall, our methods achieved the first place in the challenge leaderboard out of 37 competitors, outperforming other methods on most of evaluation metrics.

References

[2016 arXiv] [VoxResNet]
VoxResNet: Deep Voxelwise Residual Networks for Volumetric Brain Segmentation

[2018 JNeuroImage] [VoxResNet]
VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images

My Previous Reviews

Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]

Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]

Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN] [DeepLabv3+]

Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet] [Cascaded 3D U-Net] [VoxResNet] [Attention U-Net] [RU-Net & R2U-Net]

Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]

Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet] [SR+STN]

Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM]

Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17] [VRCNN] [DCAD] [DS-CNN]

Generative Adversarial Network [GAN]

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet