Brief Review — ResUNet++, CRF and TTA for Colorectal Polyp Segmentation

ResUNet++, CRF and TTA

Sik-Ho Tsang
3 min readApr 5


Colorectal Polyp Segmentation

A Comprehensive Study on Colorectal Polyp Segmentation With ResUNet++, Conditional Random Field and Test-Time Augmentation,
ResUNet++, by SimulaMet, UiT The Arctic University of Norway, University of Oslo, Sahlgrenska University Hospital, Brum Hospital, University of Gothenburg, and Oslo Metropolitan University,
2021 J. Biomedical and Health Informatics, Over 90 Citations, and
2019 ISM, Over 400 Citations
(Sik-Ho Tsang @ Medium)

Biomedical Image Segmentation
2015 … 2022
[UNETR] [Half-UNet] [BUSIS] [RCA-IUNet] 2023 [DCSAU-Net]
==== My Other Paper Readings Also Over Here ====


  1. ResUNet++
  2. CRF and TTA
  3. Results

1. ResUNet++

ResUNet++ Model Architecture
  • The backbone of ResUNet++ architecture is ResUNet: an encoder-decoder network and based on U-Net, which uses residual blocks.

Besides residual blocks (ResNet), the proposed architecture also takes the benefit of squeeze and excite block (SENet, dark gray), atrous spatial pyramid pooling (ASPP, dark red) (DeepLabv3), and attention block (Transformer, green).

  • (In the paper, they also do not describe the above modules in details. If interested, please feel free to read the stories that I wrote for them.)

2. CRF and TTA

2.1. Conditional Random Field (CRF)

Conditional Random Field (CRF)
  • Conditional Random Field (CRF) is a popular statistical modeling method used when the class labels for different inputs are not independent (e.g., image segmentation tasks).
  • CRF can model useful geometric characteristics like shape, region connectivity, and contextual information.
  • CRF concept is also used in DeepLabv1 and CRF-RNN.

Here, CRF acts as a post-processing step to refine the predicted segmenation map.

2.2. Test Time Augmentation (TTA)

  • Test Time Augmentation (TTA) is popularly used in image classification models.
  • In TTA, augmentation is applied to each test image, and multiple augmented images are created. After that, we make predictions on these augmented images, and the average prediction of each augmented image is taken as the final output prediction.

Here, only horizontal and vertical flip are applied for TTA.

3. Results

3.1. Datasets

Biomedical Segmentation Datasets

3.2. SOTA Comparisons

SOTA Comparisons on 6 Biomedical Image Segmentation Datasets

ResUNet++, either with or without CRF or/and TTA, obtains the best results. That means, sometimes, CRF or/and TTA is/are not so effective.

SOTA Comparisons on 1 Biomedical Video Segmentation Datasets

Similar observation is obtained.

3.3. Visualizations

Qualitative results comparison of the proposed models with U-Net, ResUNet, and ResUNet++.

There is a high similarity between ground truth and predicted mask for ResUNet++.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.