Review — Dilated Dense U-Net for Infant Hippocampus Subfield Segmentation

Inserting Dilated Convolution & Dense Connection into U-Net

Sik-Ho Tsang
6 min readDec 11, 2022

Dilated Dense U-Net for Infant Hippocampus Subfield Segmentation,
DUnet & ResDUnet, by University of North Carolina at Chapel Hill, Shaoxing University, Chang Gung University College of Medicine, and Korea University
2019 J Front. Neuroinform., Over 30 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation, U-Net

  • A new fully convolutional network (FCN) for infant hippocampal subfield segmentation is proposed by embedding the dilated dense network in the U-Net, namely DUnet. The embedded dilated dense network can generate multi-scale features while keeping high spatial resolution.
  • Every pair of convolutional layers are grouped with one residual connection in the DUnet, and obtain the Residual DUnet (ResDUnet).

Outline

  1. Data
  2. Dilated Dense U-Net (DUnet)
  3. Residual Dilated Dense U-Net (ResDUnet)
  4. Ablation Studies
  5. Experimental Results

1. Data

Imaging protocol for acquiring infant T1w and T2w MR images.
T1w image and manual segmentation of a representative subject from the BCP dataset (top row) and Kulaga-Yoskovitz dataset (bottom row), respectively.
  • Two datasets are used. One is BCP dataset and one is a publicly available dataset (https://www.nitrc.org/projects/mni-hisub25).
  • The first one consists of MRI scans (including T1- and T2-weighted structural MRI, DTI, and rs-fMRI) of 500 typically developing children, ages 0–5 years, over the course of 4 years. In our experiment, 10 infant subjects (6 females/4 males).
  • The imaging protocol for acquiring the T1w and T2w MR images is listed in the table above. Five hippocampal subfields were manually labeled for each subject by the consensus of two neuroradiologists.
  • The second one contains 25 adult subjects (31 ± 7 years, 12 males). Each subject consists of an isotropic 3D-MPRAGE T1-weighted image.
  • All T1w and T2w images underwent automated correction for intensity non-uniformity and intensity standardization.

2. Dilated Dense U-Net (DUnet)

2.1. Dilated Convolution

Illustration of dilated convolutional kernels: 1-dilated convolutional kernel (left); 2-dilated convolutional kernel (middle); 4-dilated convolutional kernel (right).
  • Left: Small convolutional kernels with size 3 × 3 × 3 can reduce the number of parameters, but also reduce the receptive field, compared with large kernel.
  • Middle & Right: Using the dilated convolutions, the feature maps can be computed with a high spatial resolution, and the size of the receptive field can be enlarged arbitrarily, as shown above.
  • The below equation formulates the dilated convolution:
  • When l=1, the dilated convolution becomes the normal convolution.
  • (Please feel free to read DeepLab & DilatedNet for more information about atrous/dilated convolution.)

2.2. Dilated Dense U-Net (DUnet)

The structure of the dilated dense network. The number in each operation rectangle is the number of kernels. All operations are implemented in a 3D manner, and “c” denotes the concatenation.
  • Dense block, as in DenseNet, is used.
  • Dilated convolution, as in DeepLab & DilatedNet, is used with different rates for different layers.
The structure of proposed DUnet. The number in each operation rectangle is the number of kernels. All operations are implemented in a 3D manner.
  • The dilated dense network in the U-Net is proposed to obtain a new network (DUnet).
  • The feature maps before the second pooling layer are first input into the dilated dense network.
  • Then, the output features of the dilated dense network are concatenated to the corresponding feature maps in the expanding path.

3. Residual Dilated Dense U-Net (ResDUnet)

The structure of our proposed ResDUnet. The number in each operation rectangle is the number of kernels. “⊕” denotes the element-wise summation, and all operations are implemented in a 3D manner.
  • To further improve the performance, residual connections, as in ResNet, are used in DUnet to promote the information flow within the network.
  • Every pair of convolutional layers are grouped with one residual connection along the contracting path and the expanding path of DUnet, and obtain the Residual DUnet (ResDUnet):
  • For all networks, softmax loss is used:

4. Ablation Studies

4.1. Metrics

  • Dice coefficient (Dice) and Average Symmetric Surface Distance (ASSD), are used.

4.2. Patch Size

  • Five-fold cross validation was used in the experiment for the BCP dataset. In each fold, 7 subjects for training, 1 subject for validation, and 2 subjects for testing. Experiments were performed using a NVIDIA Titan Xp with 12 GB memory.
  • 1,300 patches are extracted from each subject.
Mean (STD) values of Dice for each subfield segmentation using different patch sizes (R×R×R) on the BCP dataset by 3D U-Net.

The patch size was optimally set to 24 × 24 × 24 by comparing the results obtained by the baseline 3D U-Net method with different patch sizes.

  • Stride of 8 × 8 × 8 is used. A majority voting strategy is used for the overlap regions.

4.3. Post-Processing

An example of isolated tiny blocks, outside the hippocampal region, appeared in the automated segmentation.
  • For example, a patch in the caudate (denoted by the pink circle) may look similar to the patches in the hippocampus, and will be classified to hippocampal subfields in the testing stage.
  • To remove these artifacts automatically, the post-processing steps include searching the voxels of each automated segmentation to find the non-zero neighbors of current voxel, and to obtain several connected regions. Then, two regions with maximum volumes are selected for the final left and right hippocampal subfields.

4.4. Multi-Modality

Mean (STD) values of Dice for each subfield segmentation using different modalities on the BCP dataset.
  • The efficacy of multi-modality is tested by comparing the segmentation results obtained using only single modality images (i.e., T1w or T2w) and multi-modality images (T1w+T2w), respectively.

The network trained with multi-modality images can generate more discriminative features, which improves the performance of hippocampal subfield segmentation.

5. Experimental Results

5.1. BCP Dataset

Results on BCP Dataset.

DUnet outperforms 3D U-Net in segmenting CA1, SUB, CA4/DG and Uncus, and ResDUnet outperforms 3D U-Net in segmenting CA1, CA2/3, SUB, and Uncus, according to the Wilcoxon signed rank tests with p<0.05.

The proposed ResDUnet achieves the highest Dice coefficient for the average of subfields.

5.2. KULAGA-YOSKOVITZ Dataset

Results on KULAGA-YOSKOVITZ Dataset.

DUnet outperforms 3D U-Net and 3D U-Net+ResNet in segmenting CA1–3 and SUB, and ResDUnet outperforms 3D U-Net and 3D U-Net+ResNet in segmenting all subfields, according to the Wilcoxon signed rank tests with p<0.05.

  • DUnet and ResDUnet also outperform HIPS method, especially for segmenting the CA4/DG subfield which is the most difficult task.

Reference

[2019 J Front. Neuroinform.] [DUnet & ResDUnet]
Dilated Dense U-Net for Infant Hippocampus Subfield Segmentation

4.2. Biomedical Image Segmentation

2015 … 2019 … [DUnet & ResDUnet] 2020 [MultiResUNet] [UNet 3+] [Dense-Gated U-Net (DGNet)] [Non-local U-Net]

My Other Previous Paper Readings

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.