Brief Review — RCA-IUnet: A Residual Cross-Spatial Attention-Guided Inception U-Net Model for Tumor Segmentation in Breast Ultrasound Imaging
RCA-IUnet, for Breast Ultrasound Image Segmentation

RCA-IUnet: A Residual Cross-Spatial Attention-Guided Inception U-Net Model for Tumor Segmentation in Breast Ultrasound Imaging,
RCA-IUnet, by IIIT Allahabad, Prayagraj 211015, India
2022 Springer J. Machine Vision Applications, Over 15 Citations (Sik-Ho Tsang @ Medium)Biomedical Image Segmentation
2015 … 2021 [Expanded U-Net] [3-D RU-Net] [nnU-Net] [TransUNet] [CoTr] [TransBTS] [Swin-Unet] [Swin UNETR] [RCU-Net] [IBA-U-Net] [PRDNet] [Up-Net] [SK-Unet] 2022 [UNETR] [Half-UNet] [BUSIS]
==== My Other Paper Readings Also Over Here ====
- An efficient residual cross-spatial attention-guided inception U-Net (RCA-IUnet) model is proposed, which follows U-Net topology with residual inception depth-wise separable convolution and hybrid pooling (max pooling and spectral pooling) layers.
- In addition, cross-spatial attention filters are added to suppress the irrelevant features and focus on the target structure.
Outline
- General Steps for Medical Image Segmentation
- RCA-IUnet
- Results
1. General Steps for Medical Image Segmentation

- In the data pre-processing phase, the aim is to transform the data into the trainable format by applying certain techniques like normalization to reduce intensity variation, resize to fit the model input layer, cropping the irrelevant features or noise, data augmentation, etc.
- The processed data is utilized to train the deep learning model and generate the desired segmentation mask.
- Finally, the generated mask is post-processed (flood fill algorithm, mask extraction and binary thresholding to fill the minor holes, remove the small masked regions, and filter the masked regions, respectively) to refine the segmentation results.
2. RCA-IUnet
2.1. Overall Architecture

- The network follows U-Net topology where standard convolution and pooling operations are replaced by inception convolution with short skip connections and hybrid pooling along with the cross-spatial attention filter on long skip connection to focus on the most relevant features.
- The network has four stages of encoding and decoding layer, where at each stage the spatial dimension (width and height) of the feature map reduces by 50% and channel depth increases by 50%.
2.2. Depthwise Separable Convolution

- Depthwise separable convolution (DSC), as used in MobileNetV1, is used to replace the standard convolution.
2.3. Hybrid pooling
- To better downsample the feature maps, hybrid pooling is introduced in which downsampled feature maps from max pooling and spectral pooling are merged using the 1 × 1 convolution operation.
- In spectral pooling, discrete Fourier transform (DFT) of the input feature map is computed to truncate the high frequency values in the spectral domain and then inverse DFT is applied to convert back to the spatial domain.
2.4. Residual Inception Layer

- The inception convolution is designed by concatenating the feature maps extracted using the ReLU activated parallel DSC with different kernels of sizes such as 1 × 1, 3 × 3 and 5 × 5, and hybrid pooling while also using the batch normalization to avoid the covariance shift problem.
- Finally, the concatenated feature maps undergo 1 × 1 convolution.
- Following from the inception convolution layers, the residual inception convolution block is developed by applying double inception convolution layers with a short skip connection to merge the extracted feature maps with input using 1×1 DSC.
2.5. Cross-Spatial Attention Block

- A cross-spatial attention block is introduced in the long skip connections.
- The attention filter utilizes the extracted features maps from multiple encoded layers to develop better correlation in the spatial dimension of the feature maps.
- Feature maps from three different layers are considered to form the attention feature maps (output feature maps) which are later concatenated with the corresponding decoded layer, as above.
2.6. Loss Function

- Dice loss and binary cross entropy loss are used.
3. Results


The proposed model outperformed with best segmentation scores and minimal inference time while having considerably fewer number of training parameters.

- The effectiveness of each proposed component of the RCA-IUnet model is analyzed as above.
Each component contributes to improving the segmentation performance of the RCA-IUnet.

- (1) model pretrained on BUSIS dataset is tested on BUSI dataset, and (2) model pre-trained on BUSI dataset and is tested on BUSIS dataset, by fine-tuning.
Similar results are achieved.