Brief Review — Breast Tumor Segmentation in Ultrasound Images Using Contextual-Information-Aware Deep Adversarial Learning Framework

cGAN+AC+CAW: Conditional GAN (cGAN), Atrous Convolution (AC), Channel Attention with Weighting (CAW), are Used

Examples of Breast Ultrasound (BUS) Images
  • U-Net-like model is enhanced by using atrous convolution (AC) to capture spatial and scale context, and using the channel attention along with channel weighting (CAW) mechanisms to promote the tumor-relevant features.
  • Additionally, conditional GAN (cGAN) is used for adversarial learning.
  • This paper should be the journal version of 2019 arXiv: cGAN+AC+CAW but without random forest for classification problem here.


  1. cGAN+AC+CAW
  2. Results


1.1. Overall Architecture

The architecture of the proposed model that consists of Generator (G) and Discriminator (D) networks.
  • The model consists of a generator network that extracts breast tumor relevant features, and a discriminator network that predicts if a label mask is a real or fake segmentation of the input BUS image.

1.2. Generator

  • The plain encoder–decoder structure is modified by inserting an AC block, as in DeepLab, between Conv3 and Conv4, in addition to a CAW block inserted between Conv7 and Dconv1. (AC Block and CAW block detailed diagrams are as shown later below.)
  • Each layer in the encoder section is followed by batch normalization (BN) (except for Conv1) and Leaky ReLU with slope 0.2. The decoder section is a sequence of transposed-convolutional layers followed by batch normalization, and Dropout with rate 0.5 (only in DConv1, DConv 2 and DConv3) with ReLU.
  • The filters of the convolutional and deconvolutional layers are defined by a kernel of 4 × 4, and a stride of 2.
  • Skip connections, as from ResNet, are employed between the corresponding layers in the encoder and decoder sections.
  • After the last decoding layer (Dconv7), the tanh activation function is used to generate a binary mask of the breast tumor.

1.3. Atrous Convolution (AC)

The architecture of the AC block with different rates of AC (r = 1, 2 and 3).
Use of AC to increase the receptive field in order to accommodate variable size and shapes of breast tumors.
  • The first three convolutional layers have a kernel size of 3 × 3 and rates of 1, 6, and 9, respectively. The fourth convolutional layer has a kernel size of 1 × 1 followed by a global average pooling (GAP).
  • An up-sampling layer is employed after each branch and then all features are concatenated.

1.4. CAW Block

The architecture of the CAW block.
  • The CAW block is an aggregation of a channel attention from DANet with a channel weighting module from SENet.
  • It has two branches: the channel attention process (top branch) and the channel weighting process (bottom branch). Since the CAW block is placed after the last encoder layer, the processed activation map has spatial dimensions (H × W) of 1 × 1: indeed, it is a vector of C = 512 scalars. Hence, the method works only on the channel feature space.
  • In brief, the channel attention part is similar to DANet or Transformer idea, while the channel weighting part is similar to SENet idea. (Please feel free to read the stories)

1.4. Discriminator

  • It comprises a set of five convolutional layers with kernels of size 4 × 4 with a stride of 2, except for Conv4 and Conv5 where the stride is 1.
  • The batch normalization is used after Conv2 to Conv4. Leaky ReLU with slope 0.2 is the non-linear activation function used after Conv2 to Conv5, while the sigmoid function is used after Conv5.
  • The input of the discriminator is the concatenation of the BUS image and a binary mask.
  • The output of the discriminator is a 10 × 10 matrix having values varying from 0.0 (completely fake) to 1.0 (real).

1.5. Loss Function

  • The loss function of the generator G comprises three terms: adversarial loss (binary cross entropy loss), L1-norm to boost the learning process, and SSIM loss to improve the shape of the boundaries of segmented masks:
  • where z is a random variable.
  • The loss function of the discriminator D is:

2. Results

2.1. Ablation Study

Analyzing different configurations of the proposed method with dataset A and dataset B.
The performance of the proposed model with different combinations of loss functions.

2.2. SOTA Comparisons

Comparison between the proposed model and six state-of-the-art methods in terms of accuracy, Dice, IoU, sensitivity and specificity, using datasets A and B.
  • Accuracy here should be the per-pixel accuracy rather than image-level classification accuracy.

2.3. Qualitative Results

Segmentation results of five models with the Dataset A.
Segmentation results of five models with the Dataset B.
  • Red: High false negatives, Green: False positive (in green).
  • U-Net provided proper segmentation, but it has a less accurate boundary around the tumor region.
Examples of incorrect tumor segmentation and localization results.

2.4. Inference Time

  • The execution time (inference time) of each segmentation model. FCN, U-Net, SegNet, ERFNet, DCGAN, and the proposed model achieve 35.15, 20.33, 17.71, 78.78, 18.27, and 19.62 frames per second (FPS), respectively.
  • Although FCN achieves 35.15 FPS, its IoU and Dice values with both BUS image datasets are much lower than the ones of the proposed model.



PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store