Review — HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation

HyperDense-Net, DenseNet Concept in 3D Network, With Multi-Modalities

5 min readJan 14, 2023

--

**Example of data from a training subject. Neonatal isointense brain images from a mid-axial T1 slice (left), the corresponding T2 slice (middle), and manual segmentation (right).**

HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation,
HyperDense-Net, by École de technologie supérieure, Xidian University,
2019 TMI, Over 340 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation

HyperDenseNet, a 3-D fully convolutional neural network, is proposed that extends the definition of dense connectivity to multi-modal segmentation problems.
Each imaging modality has a path, and dense connections occur not only between the pairs of layers within the same path but also between those across different paths, increases significantly the learning representation.

Outline

HyperDense-Net Motivations
HyperDense-Net Architecture
Results

1. HyperDense-Net Motivations

1.1. Densely-Connected Concept

Let xl be the output of the l-th layer by a mapping Hl composed of a convolution followed by a non-linear activation function:

A densely-connected network, originated from DenseNet, concatenates all feature outputs in a feed-forward manner:

where [. . .] denotes a concatenation operation.

1.2. Multi-Modal Motivation

For simplicity, consider the scenario of two image modalities.
In general, the output of the lth layer in a stream s can then be defined as follows:

Shuffling and interleaving feature map elements in a CNN was recently found to enhance the efficiency and performance, while serving as a strong regularizer, it is therefore beneficial for intermediate layers to offer a variety of information exchange while preserving the aforementioned deterministic functions:

with πsl being a function that permutes the feature maps given as input. For instance, in the case of two image modalities, we could have:

to have information exchange between 2 modalities as above.

2. HyperDense-Net Architecture

**A section of the proposed HyperDenseNet in the case of two image modalities.** Each gray region represents a convolutional block. Red arrows correspond to convolutions and black arrows indicate dense connections between feature maps.

Each gray region represents a convolutional block.
For simplicity, it is assumed that the red arrows indicate convolution operations only, whereas the black arrows represent the direct connections between feature maps from different layers, within and in-between the different streams.

Thus, the input of each convolutional block (maps before the red arrow) is the concatenation of the outputs (maps after the red arrow) of all the preceding layers from both paths.

1.3. Multi-Modal Baselines

**Section of baseline architectures: single-path dense (left), dual-path dense (middle) with disentangled modalities and disentangled modalities with early fusion in a single path (right).**

Single Dense Path (Left): An early-fusion strategy is followed, in which MRI T1 and T2 are integrated at the input of the CNN and processed jointly along a single path.
Dual Dense Path (Middle): An Late-Fusion strategy is followed, in which each modality is processed independently in different streams and learned features are fused before the first fully connected layer.
Early-Fusion (Right): An early fusion model is used, which combines features from different streams after the first convolutional layer.

1.4. Some Details

**Layers in Baseline and Proposed Architecture With Input 27×27×27.**

The sub-volumes of size 27×27×27 are considered for training, 35×35×35 non-overlapping sub-volumes during inference.
Cross-entropy is used as cost function:

The network was trained for 30 epochs, each composed of 20 subepochs. At each sub-epoch, a total of 1000 samples were randomly selected from the training images and processed in batches of size 5.

2. Results

Dice Similarity Coefficient (DSC), Modified Hausdorff distance (MHD), are measured.

2.1. iSEG Challenge

**Number of Parameters & Inference Time**

HyperDenseNet obtains the best performance.

**Left: Training accuracy plots, and Right: Validation accuracy plots for the proposed architecture and the baselines on the iSeg-2017 challenge data.**

HyperDenseNet outperforms baselines in both cases, achieving better results than architectures with a similar number of parameters.

**Qualitative results of segmentation achieved by the baselines and HyperDenseNet on two validation subjects (each row shows a different subject).**

HyperDenseNet typically recovers thin regions better than the baselines,

**iSEG-2017 for Proposed HyperDenseNet and Top-5 Ranked Methods at the First Round Submission**

The proposed network ranked among the top-3 methods in 6 out of 9 metrics, considering the results of the first and second rounds of submissions.

2.2. MRBrainS Challenge

**Comparisons with SOTA 3D Networks on the MRBrainS Challenge**

Comparing the different modality combinations, the two-modality versions of HyperDenseNet yielded competitive performances, although there is a significant variability between the three configurations.
HyperDenseNet with three modalities yields significantly better segmentations, with the highest mean DSC values for all three tissues.