Brief Review — ResUNet++, CRF and TTA for Colorectal Polyp Segmentation
A Comprehensive Study on Colorectal Polyp Segmentation With ResUNet++, Conditional Random Field and Test-Time Augmentation,
ResUNet++, by SimulaMet, UiT The Arctic University of Norway, University of Oslo, Sahlgrenska University Hospital, Brum Hospital, University of Gothenburg, and Oslo Metropolitan University,
2021 J. Biomedical and Health Informatics, Over 90 Citations, and
2019 ISM, Over 400 Citations (Sik-Ho Tsang @ Medium)
- ResUNet++ has been proposed in 2019 ISM, “ResUNet++: An Advanced Architecture for Medical Image Segmentation”.
- In this paper, ResUNet++ is extended with the use of Conditional Random Field (CRF) and Test-Time Augmentation (TTA).
- CRF and TTA
- The backbone of ResUNet++ architecture is ResUNet: an encoder-decoder network and based on U-Net, which uses residual blocks.
Besides residual blocks (ResNet), the proposed architecture also takes the benefit of squeeze and excite block (SENet, dark gray), atrous spatial pyramid pooling (ASPP, dark red) (DeepLabv3), and attention block (Transformer, green).
- (In the paper, they also do not describe the above modules in details. If interested, please feel free to read the stories that I wrote for them.)
2. CRF and TTA
2.1. Conditional Random Field (CRF)
- Conditional Random Field (CRF) is a popular statistical modeling method used when the class labels for different inputs are not independent (e.g., image segmentation tasks).
- CRF can model useful geometric characteristics like shape, region connectivity, and contextual information.
- CRF concept is also used in DeepLabv1 and CRF-RNN.
Here, CRF acts as a post-processing step to refine the predicted segmenation map.
2.2. Test Time Augmentation (TTA)
- Test Time Augmentation (TTA) is popularly used in image classification models.
- In TTA, augmentation is applied to each test image, and multiple augmented images are created. After that, we make predictions on these augmented images, and the average prediction of each augmented image is taken as the final output prediction.
Here, only horizontal and vertical flip are applied for TTA.
3.2. SOTA Comparisons
ResUNet++, either with or without CRF or/and TTA, obtains the best results. That means, sometimes, CRF or/and TTA is/are not so effective.
Similar observation is obtained.
There is a high similarity between ground truth and predicted mask for ResUNet++.