Review — HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation

HyperDense-Net, DenseNet Concept in 3D Network, With Multi-Modalities

Sik-Ho Tsang
5 min readJan 14


Example of data from a training subject. Neonatal isointense brain images from a mid-axial T1 slice (left), the corresponding T2 slice (middle), and manual segmentation (right).

HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation,
HyperDense-Net, by École de technologie supérieure, Xidian University,
2019 TMI, Over 340 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation

  • HyperDenseNet, a 3-D fully convolutional neural network, is proposed that extends the definition of dense connectivity to multi-modal segmentation problems.
  • Each imaging modality has a path, and dense connections occur not only between the pairs of layers within the same path but also between those across different paths, increases significantly the learning representation.


  1. HyperDense-Net Motivations
  2. HyperDense-Net Architecture
  3. Results

1. HyperDense-Net Motivations

1.1. Densely-Connected Concept

  • Let xl be the output of the l-th layer by a mapping Hl composed of a convolution followed by a non-linear activation function:
  • A densely-connected network, originated from DenseNet, concatenates all feature outputs in a feed-forward manner:
  • where [. . .] denotes a concatenation operation.

1.2. Multi-Modal Motivation

  • For simplicity, consider the scenario of two image modalities.
  • In general, the output of the lth layer in a stream s can then be defined as follows:
  • Shuffling and interleaving feature map elements in a CNN was recently found to enhance the efficiency and performance, while serving as a strong regularizer, it is therefore beneficial for intermediate layers to offer a variety of information exchange while preserving the aforementioned deterministic functions:
  • with πsl being a function that permutes the feature maps given as input. For instance, in the case of two image modalities, we could have:
  • to have information exchange between 2 modalities as above.

2. HyperDense-Net Architecture

A section of the proposed HyperDenseNet in the case of two image modalities. Each gray region represents a convolutional block. Red arrows correspond to convolutions and black arrows indicate dense connections between feature maps.
  • Each gray region represents a convolutional block.
  • For simplicity, it is assumed that the red arrows indicate convolution operations only, whereas the black arrows represent the direct connections between feature maps from different layers, within and in-between the different streams.

Thus, the input of each convolutional block (maps before the red arrow) is the concatenation of the outputs (maps after the red arrow) of all the preceding layers from both paths.

1.3. Multi-Modal Baselines

Section of baseline architectures: single-path dense (left), dual-path dense (middle) with disentangled modalities and disentangled modalities with early fusion in a single path (right).
  • Single Dense Path (Left): An early-fusion strategy is followed, in which MRI T1 and T2 are integrated at the input of the CNN and processed jointly along a single path.
  • Dual Dense Path (Middle): An Late-Fusion strategy is followed, in which each modality is processed independently in different streams and learned features are fused before the first fully connected layer.
  • Early-Fusion (Right): An early fusion model is used, which combines features from different streams after the first convolutional layer.

1.4. Some Details

Layers in Baseline and Proposed Architecture With Input 27×27×27.
  • The sub-volumes of size 27×27×27 are considered for training, 35×35×35 non-overlapping sub-volumes during inference.
  • Cross-entropy is used as cost function:
  • The network was trained for 30 epochs, each composed of 20 subepochs. At each sub-epoch, a total of 1000 samples were randomly selected from the training images and processed in batches of size 5.

2. Results

  • Dice Similarity Coefficient (DSC), Modified Hausdorff distance (MHD), are measured.

2.1. iSEG Challenge

DSC on iSEG Challenge Test Set
Number of Parameters & Inference Time

HyperDenseNet obtains the best performance.

Left: Training accuracy plots, and Right: Validation accuracy plots for the proposed architecture and the baselines on the iSeg-2017 challenge data.

HyperDenseNet outperforms baselines in both cases, achieving better results than architectures with a similar number of parameters.

Qualitative results of segmentation achieved by the baselines and HyperDenseNet on two validation subjects (each row shows a different subject).

HyperDenseNet typically recovers thin regions better than the baselines,

iSEG-2017 for Proposed HyperDenseNet and Top-5 Ranked Methods at the First Round Submission
  • The proposed network ranked among the top-3 methods in 6 out of 9 metrics, considering the results of the first and second rounds of submissions.

2.2. MRBrainS Challenge

Comparisons with SOTA 3D Networks on the MRBrainS Challenge

Comparing the different modality combinations, the two-modality versions of HyperDenseNet yielded competitive performances, although there is a significant variability between the three configurations.

HyperDenseNet with three modalities yields significantly better segmentations, with the highest mean DSC values for all three tissues.

Comparison With Different Methods on MRBrainS Challenge

HyperDenseNet ranks first among competing methods, obtaining the highest DSC and HD for GM and WM.

A typical example of the segmentations achieved by the proposed HyperDenseNet in a validation subject (Subject 1 in the training set) for 2 and 3 modalities.
  • HyperDenseNet using three modalities can handle thin regions better than its two-modality versions.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.