Review — HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation
HyperDense-Net, DenseNet Concept in 3D Network, With Multi-Modalities
HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation,
HyperDense-Net, by École de technologie supérieure, Xidian University,
2019 TMI, Over 340 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation
- HyperDenseNet, a 3-D fully convolutional neural network, is proposed that extends the definition of dense connectivity to multi-modal segmentation problems.
- Each imaging modality has a path, and dense connections occur not only between the pairs of layers within the same path but also between those across different paths, increases significantly the learning representation.
Outline
- HyperDense-Net Motivations
- HyperDense-Net Architecture
- Results
1. HyperDense-Net Motivations
1.1. Densely-Connected Concept
- Let xl be the output of the l-th layer by a mapping Hl composed of a convolution followed by a non-linear activation function:
- A densely-connected network, originated from DenseNet, concatenates all feature outputs in a feed-forward manner:
- where [. . .] denotes a concatenation operation.
1.2. Multi-Modal Motivation
- For simplicity, consider the scenario of two image modalities.
- In general, the output of the lth layer in a stream s can then be defined as follows:
- Shuffling and interleaving feature map elements in a CNN was recently found to enhance the efficiency and performance, while serving as a strong regularizer, it is therefore beneficial for intermediate layers to offer a variety of information exchange while preserving the aforementioned deterministic functions:
- with πsl being a function that permutes the feature maps given as input. For instance, in the case of two image modalities, we could have:
- to have information exchange between 2 modalities as above.
2. HyperDense-Net Architecture
- Each gray region represents a convolutional block.
- For simplicity, it is assumed that the red arrows indicate convolution operations only, whereas the black arrows represent the direct connections between feature maps from different layers, within and in-between the different streams.
Thus, the input of each convolutional block (maps before the red arrow) is the concatenation of the outputs (maps after the red arrow) of all the preceding layers from both paths.
1.3. Multi-Modal Baselines
- Single Dense Path (Left): An early-fusion strategy is followed, in which MRI T1 and T2 are integrated at the input of the CNN and processed jointly along a single path.
- Dual Dense Path (Middle): An Late-Fusion strategy is followed, in which each modality is processed independently in different streams and learned features are fused before the first fully connected layer.
- Early-Fusion (Right): An early fusion model is used, which combines features from different streams after the first convolutional layer.
1.4. Some Details
- The sub-volumes of size 27×27×27 are considered for training, 35×35×35 non-overlapping sub-volumes during inference.
- Cross-entropy is used as cost function:
- The network was trained for 30 epochs, each composed of 20 subepochs. At each sub-epoch, a total of 1000 samples were randomly selected from the training images and processed in batches of size 5.
2. Results
- Dice Similarity Coefficient (DSC), Modified Hausdorff distance (MHD), are measured.
2.1. iSEG Challenge
HyperDenseNet obtains the best performance.
HyperDenseNet outperforms baselines in both cases, achieving better results than architectures with a similar number of parameters.
HyperDenseNet typically recovers thin regions better than the baselines,
- The proposed network ranked among the top-3 methods in 6 out of 9 metrics, considering the results of the first and second rounds of submissions.
2.2. MRBrainS Challenge
Comparing the different modality combinations, the two-modality versions of HyperDenseNet yielded competitive performances, although there is a significant variability between the three configurations.
HyperDenseNet with three modalities yields significantly better segmentations, with the highest mean DSC values for all three tissues.
HyperDenseNet ranks first among competing methods, obtaining the highest DSC and HD for GM and WM.
- HyperDenseNet using three modalities can handle thin regions better than its two-modality versions.
Reference
[2019 TMI] [HyperDense-Net]
HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation
4.2. Biomedical Image Segmentation
2015–2019 … [HyperDense-Net] 2020 [MultiResUNet] [UNet 3+] [Dense-Gated U-Net (DGNet)] [Non-local U-Net] [SAUNet] 2021 [Expanded U-Net]