Review: H-DenseUNet — 2D & 3D DenseUNet for Intra & Inter Slice Features (Biomedical Image Segmentation)

Outperforms U-Net, Ranked the 1st on Lesion Segmentation

6 min readOct 2, 2019

**Examples of contrast-enhanced CT scans showing the large variations of shape, size, location of liver lesion.**

In this story, H-DenseUNet, by The Chinese University of Hong Kong (CUHK), is reviewed. H-DenseUNet, hybrid densely connected U-Net,

consists of a 2D DenseUNet for efficiently extracting intra-slice features and a 3D counterpart for hierarchically aggregating volumetric contexts under the spirit of the auto-context algorithm for liver and tumor segmentation.
is formulated in an end-to-end manner, where the intra-slice representations and inter-slice features can be jointly optimized through a hybrid feature fusion (HFF) layer.
ranked the 1st on lesion segmentation, achieved very competitive performance on liver segmentation in the 2017 LiTS Leaderboard, and also achieved the state-of-the-art results on the 3DIRCADb Dataset.

This is the paper in 2018 TMI (Current Impact factor 7.816) with more than 120 citations. (Sik-Ho Tsang @ Medium)

Outline

Network Architecture & Details
Experimental Results

1.1. 2D DenseUNet

The above figure (a) shows the pipeline of H-DenseUNet, to be elaborated more later. H-DenseUNet brings the advantages of DenseNet and U-Net together.
For each image I, it is 224×224×12×1. It is the size of 224×224 with 12 slices and only 1 channel. With n as batch size, n×224×224×12×1.
This volumetric data I is transformed to three adjacent slices I2d. Specifically, every three adjacent slices along z-axis are stacked together, as shown at the above figure (b).
For I2d, it is 224×224×3. With n as batch size, it is 12n×224×224×3.
This I2d acts as input into 2D DenseUNet. 2D DenseUNet-167 is used here which has 167 layers, as shown at the above figure (c).
The dense block denotes the cascade of several micro-blocks, in which all layers are directly connected.
To change the size of feature-maps, the transition layer is employed, which consists of a batch normalization layer and a 1×1 convolution layer followed by an average pooling layer.
The upsampling layer is implemented by the bilinear interpolation, followed by the summation with low-level features (i.e., UNet connections) and a 3×3 convolutional layer.
Long skip connections are used.
The output X2d is 12n×224×224×64.
This 2D H-DenseUNet is trained using 21 hours.

1.2. 3D DenseUNet

The output feature maps X2d and score maps from 2D DenseUNet are transformed back to the volumetric shape of n×224×224×12×64, X2d’, i.e. size of 224×224 with 12 slices and 64 channels.
This volumetric shape acts as input into 3D DenseUNet. 3D DenseUNet-65 is used.

1.3. Hybrid Feature Fusion (HFF)

X3d, the feature volume from layer “upsampling layer 5" in 3D DenseUNet-65, and X2d’ (mentioned in 1.2.) are added together to form Z.
i.e. Z=X3d + X2d’
Z denotes the hybrid feature, which refers to the sum of intra-slice and inter-slice features from 2D and 3D network.
This hybrid feature is jointly learned and optimized in the HFF layer.
This 3D counterpart of H-DenseUNet cost only 9 hours to converge, which is significantly faster than training the 3D counterpart with original data solely (63 hours).

1.4. Details

Detailed architecture is as follows:

**Details of 2D DenseUNet-167 and 3D DenseUNet-65**

Weighted cross-entropy function is used as loss function:

First, 2D-DenseUNet is optimized.
Then, parameters of 2D-DenseUNet are fixed. Only 3D-DenseUNet and HFF layer are optimized.
Finally, The whole network is jointly fine-tuned with following combined loss:

To avoid the holes in the liver, a largest connected component labeling is performed to refine the liver result.
After that, the final lesion segmentation result is obtained by removing lesions outside the final liver region.
In the test phase, the total processing time of one subject depends on the number of slices, ranging from 30 seconds to 200 seconds.

2. Experimental Results

2.1. Ablation Study

Of course, H-DenseUNet performs the best.

2.2. 2017 LiTS Challenge

**Leaderboard of 2017 Liver Tumor Segmentation (LiTS) Challenge (**Note: — denotes that the team participated in ISBI competition and the measurement was not evaluated.)

There were more than 50 submissions in 2017 ISBI and MICCAI LiTS challenges.
H-DenseUNet achieved the 1st place among all state-of-the-arts in the lesion segmentation and very competitive result to DeepX for liver segmentation.
Note that H-DenseUNet surpassed DeepX by a significant margin in the Dice per case evaluation for lesion.
Also, H-DenseUNet only uses 1 model while DeepX uses multi-model combination strategy.

2.3. 3DIRCADb Dataset

**Tumor segmentation results on 3DIRCADb dataset**

**Liver segmentation results on 3DIRCADb dataset**

H-DenseUNet outperforms U-Net with 14.5% improvement on DICE for tumor segmentation.

Around from 2017 to 2018 after DenseNet, there are papers borrowed the DenseNet architecture or idea to improve the segmentation accuracy in Biomedical Image Segmentation including this paper, UNet++, and DenseVoxNet.