Review — Inception U-Net Architecture for Semantic Segmentation to Identify Nuclei in Microscopy Cell Images

Inception U-Net, Smaller Model Size Than U-Net++

Sik-Ho Tsang
4 min readFeb 14, 2023
Nuclei images with their respective segmentation masks. This data is part of the KDSB18 dataset.

Inception U-Net Architecture for Semantic Segmentation to Identify Nuclei in Microscopy Cell Images,
Inception U-Net, by Indian Institute of Information Technology Allahabad,
2020 ACM TOMM, Over 50 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation

  • Inception U-Net is proposed, which consists of a switch normalization layer, convolution layers, and inception layers (concatenated 1×1, 3×3, and 5×5 convolution and the hybrid of a max and Hartley spectral pooling layer) connected in the U-Net fashion for generating the image masks.
  • A novel objective function segmentation loss is proposed based on the binary cross entropy (BCE) , dice coefficient, and intersection over union (IoU) loss functions.


  1. Inception U-Net
  2. Results

1. Inception U-Net

1.1. Overall Architecture

Schematic representation of the Inception U-Net architecture.

Inception U-Net consists of 3 major modules: Switch Normalization, hybrid pooling, and Inception layer.

1.1.1. Switch Normalization

The model is fed with the normalized input using the switch normalization technique. This normalization technique features end-to-end learning to automatically adapt different normalization operations, such as batch norm, instance norm [34], and layer norm.

1.1.2. Hybrid Pooling

Each convolution is also accompanied by a new hybrid pooling (concatenation of max pooling and Hartley spectral pooling).

  • Hartley spectral pooling [37] is a transformed-domain variant of pooling that uses discrete Hartley transform (DHT) on the input feature map to get the frequency domain that provides the sparse basis for input with the spatial structures to filter the higher frequency. It preserves more spatial information per parameter than any other pooling method.
Effect of downsampling (2x2, 8x8, 16x16, 32x32) on a normalized nuclei image from the KDSB18 dataset (left to right, respectively).
  • The outcome of max pooling and Hartley spectral pooling for different downsampling factors of 2×2, 8×8, 16×16, and 32×32 for an image from the KDSB18 dataset.

Hybrid pooling is implemented as a parameterized concatenation of Hartley spectral pooling [37] and max pooling using 1×1 convolution.

1.1.3. Inception Layer

  • Following GoogLeNet, the inception layer follows concatenating the convolutions of varying filter sizes (1×1, 3×3, 5×5) and a hybrid pooling layer.
Inception U-Net Layer Architecture.
  • The detailed overall architecture of Inception U-Net is as shown above.

1.2. Loss Functions

  • The loss function is the weighted average of BCL, DCL, and IoUL.

2. Results

2.1. Quantitative Results

Performance Comparison for Segmentation Loss Components on Different Models.

The proposed segmentation loss function (BCL+DCL+IoUL) exhibits the best result on the KDSB18 dataset.

Training and validation assessment of the original U-Net, U-Net++, and Inception U-Net on the KDSB18 dataset using segmentation loss.
  • Although the original U-Net model trained a bit faster (38 epochs) than the inception U-Net model, it had a lower IoU value of 0.5981 as shown in the table.
  • The U-Net++ model exhibits similar results to the proposed model but has a huge number of intermediate convolutions with 8M trainable parameters and 50k nontrainable parameters, resulting in a heavy resource (memory, processing power) requirement while the proposed Inception U-Net model consisting of the stack of convolution, inception, and hybrid pooling layers with total trainable parameters of only 3.6M.

2.1. Qualitative Results

Comparison of segmentation results among U-Net, U-Net++, and Inception U-Net for KDSB18.


[2020 ACM TOMM] [Inception U-Net]
Inception U-Net Architecture for Semantic Segmentation to Identify Nuclei in Microscopy Cell Images

4.2. Biomedical Image Segmentation

2015–2020 … [Inception U-Net] 2021 [Expanded U-Net] [3-D RU-Net]

==== My Other Previous Paper Readings ====



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.