Review — Inception U-Net Architecture for Semantic Segmentation to Identify Nuclei in Microscopy Cell Images
Inception U-Net Architecture for Semantic Segmentation to Identify Nuclei in Microscopy Cell Images,
Inception U-Net, by Indian Institute of Information Technology Allahabad,
2020 ACM TOMM, Over 50 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation
- Inception U-Net is proposed, which consists of a switch normalization layer, convolution layers, and inception layers (concatenated 1×1, 3×3, and 5×5 convolution and the hybrid of a max and Hartley spectral pooling layer) connected in the U-Net fashion for generating the image masks.
- A novel objective function segmentation loss is proposed based on the binary cross entropy (BCE) , dice coefficient, and intersection over union (IoU) loss functions.
- Inception U-Net
1. Inception U-Net
1.1. Overall Architecture
1.1.1. Switch Normalization
The model is fed with the normalized input using the switch normalization technique. This normalization technique features end-to-end learning to automatically adapt different normalization operations, such as batch norm, instance norm , and layer norm.
1.1.2. Hybrid Pooling
Each convolution is also accompanied by a new hybrid pooling (concatenation of max pooling and Hartley spectral pooling).
- Hartley spectral pooling  is a transformed-domain variant of pooling that uses discrete Hartley transform (DHT) on the input feature map to get the frequency domain that provides the sparse basis for input with the spatial structures to filter the higher frequency. It preserves more spatial information per parameter than any other pooling method.
- The outcome of max pooling and Hartley spectral pooling for different downsampling factors of 2×2, 8×8, 16×16, and 32×32 for an image from the KDSB18 dataset.
Hybrid pooling is implemented as a parameterized concatenation of Hartley spectral pooling  and max pooling using 1×1 convolution.
1.1.3. Inception Layer
- Following GoogLeNet, the inception layer follows concatenating the convolutions of varying filter sizes (1×1, 3×3, 5×5) and a hybrid pooling layer.
- The detailed overall architecture of Inception U-Net is as shown above.
1.2. Loss Functions
- The loss function is the weighted average of BCL, DCL, and IoUL.
2.1. Quantitative Results
The proposed segmentation loss function (BCL+DCL+IoUL) exhibits the best result on the KDSB18 dataset.
- Although the original U-Net model trained a bit faster (38 epochs) than the inception U-Net model, it had a lower IoU value of 0.5981 as shown in the table.
- The U-Net++ model exhibits similar results to the proposed model but has a huge number of intermediate convolutions with 8M trainable parameters and 50k nontrainable parameters, resulting in a heavy resource (memory, processing power) requirement while the proposed Inception U-Net model consisting of the stack of convolution, inception, and hybrid pooling layers with total trainable parameters of only 3.6M.
2.1. Qualitative Results
[2020 ACM TOMM] [Inception U-Net]
Inception U-Net Architecture for Semantic Segmentation to Identify Nuclei in Microscopy Cell Images