Review — Breast Tumor Segmentation and Shape Classification in Mammograms Using Generative Adversarial and Convolutional Neural Network

cGAN for Segmentation, CNN for Classification

Sik-Ho Tsang
6 min readNov 27, 2022

Breast Tumor Segmentation and Shape Classification in Mammograms Using Generative Adversarial and Convolutional Neural Network,
cGAN JESWA’20, by Universitat Rovira i Virgili, A ∗STAR, and Hospital Universitari Sant Joan, 2020 JESWA, Over 130 Citations (Sik-Ho Tsang @ Medium) Medical Imaging, Medical Image Anlaysis, Multi-Tasking Learning, Image Segmentation, Image Classification

  • A conditional Generative Adversarial Network (cGAN) is used to segment a breast tumor within a region of interest (ROI) in a mammogram.
  • The generative network learns to recognize the tumor area and to create the binary mask that outlines it. In turn, the adversarial network learns to distinguish between real (ground truth) and synthetic segmentations, thus enforcing the generative network to create binary masks as realistic as possible.


  1. Overall Framework
  2. Conditional GAN (cGAN) for Image Segmentation
  3. Shape classification model (CNN) for Image Classification
  4. Image Segmentation Results
  5. Image Classification Results

1. Overall Framework

1.1. Framework

Automatic workflow for breast tumor segmentation and shape classification system.
  • The proposed CAD system is divided into two stages: breast tumor segmentation and shape classification.

1.2. Detection

Mass detection accuracy of proposed method compared with the existing state-of-the-art methods.
  • Before feeding into the first stage, SSD is used to locate the tumor position and fit a bounding box around it.
  • SSD is found to the best among the methods in the above table.

1.3. Loose Frame

Three cropping strategies: (a) full mammogram, (b) loose frame, © tight frame.
  • A method so-called “loose frame” is used to expand the original bounding box coordinates by adding extra space around.
  • The loose frame provides a convenient proportion between healthy and tumorous pixels.

1.4. Pre-Processing

  • ROI images are scaled to 256 ×256 pixels, which is the optimal cGAN input size found experimentally.
  • After scaling, they are pre-processed for noise removal, and then contrast is enhanced using histogram equalization. Finally, normalizing into range of [0, 1] is performed.
Two examples that show the need of morphological post processing after the segmentation.

1.5. Segmentation Then Classification

  • The prepared data is then fed to the cGAN to obtain a binary mask of the breast tumor, which is post-processed using morphological operations (filter sizes of 3 ×3 for closing, 2 ×2 for erosion, and 3 ×3 for dilation) to remove small speckles, as above.
  • The output binary mask is downsampled into 64 ×64 pixels, which is then fed to a multi-class CNN shape descriptor to categorize it into four classes: irregular, lobular, oval and round.

2. Conditional GAN (cGAN) for Image Segmentation

2.1. Architecture

Proposed cGAN architecture: generator G (top), and discriminator D (down).
  • The Generator G network of the cGAN is an FCN composed of encoding and decoding layers, which learn the intrinsic features of healthy and unhealthy (tumor) breast tissue, and generate a binary mask according to these features.
  • The Discriminative D network of the cGAN assesses if a given binary mask is likely to be a realistic segmentation or not.
  • (For architecture details, please feel free to read the above figure or paper.)

2.2. Loss Functions

Proposed cGAN framework based on dice and BCE losses.
  • Let x be a tumor ROI, y the ground truth mask, z a random variable, λ an empirical weighting factor, G(x, z) and D(x, G(x, z)) the outputs of G and D, respectively.
  • Then, the loss function of G is defined as:
  • where z is introduced as Dropout in the decoding layers Dn1, Dn2 and Dn3, and the lDice(y, G(x, z)) is the dice loss of the predicted mask with respect to ground truth, which is defined as:
  • where ◦ is the pixel wise multiplication of the two images and |.| is the total sum of pixel values of a given image.
  • The loss function of D is:
  • These two terms compute BCE loss using both masks.
  • The optimization of G and D is done concurrently, i.e. , one optimization step for both networks at each iteration.
Dice and L 1-norm loss comparison over iterations.
  • The above figure shows the dice loss achieves lower values (more optimal) than the L1-norm loss.

3. Shape classification model (CNN) for Image Classification

CNN architecture for tumor shape classification.
  • The CNN attempts to use only shape context to classify the tumor shapes. (For architecture details, please feel free to read the above figure or paper.)
  • A weighted categorical cross-entropy loss is used to avoid the problem of unbalanced dataset.

4. Image Segmentation Results

Dice and IoU metrics obtained with the proposed model with/without post-processing and ten alternatives evaluated on the testing sets of the private and INbreast datasets.

According to the results, the proposed method outperforms the compared state-of-the-art methods in all cases except for the IoU computed on tight crops of the private dataset. The SLSDeep approach yielded the best IoU (79.93%), whereas the proposed method yielded the second best result (79.87%) with a very small difference of 0.06%.

  • The post-processing improved the results of the proposed model by 1% with the three framing inputs.
Boxplot of dice (Top) and IoU (Bottom) score over five models compared to the proposed method on loose frames of the test subset of INbreast dataset (106 samples).

There are many outliers in the results for the segmentation based on the cGAN using pre-trained ResNet101 layers, while using the proposed cGAN trained from scratch, there are few number of outliers.

Segmentation results of two testing samples extracted from the INbreast dataset with the three cropping strategies. (the true positives (TP:yellow), false negatives (FN:red), false positives (FP:green) and true nega- tives (TN:black).)
Segmentation results of seven models with the INbreast dataset and two cropping strategies. (the true positives (TP:yellow), false negatives (FN:red), false positives (FP:green) and true nega- tives (TN:black).)

The proposed method clearly outperforms the rest for all tumors except for the second one.

5. Image Classification Results

Confusion matrix of the tumor shape classification of testing samples of the DDSM dataset.

The proposed method yielded around 73% of classification accuracy for irregular and lobular classes.

Shape classification overall accuracy with the DDSM dataset.

The proposed classifier based only on binary masks yields an overall accuracy of 80%, outperforming the second best results.

Mean ROC curve of 5 folds, for TPR and FPR from shape classification result of 292 test images from DDSM dataset.

The above figure shows ROC curve illustrating that the proposed model attained AUC about 0.8.

Distribution of breast cancer molecular subtypes samples from the hospital dataset with respect to its predicted mask shape.

Most of Luminal-A and -B samples (i.e., 96/123 and 82/107 for Luminal-A and -B, respectively) are mostly assigned to irregular and lobular shape classes. In turn, oval and round tumors give indications to the Her-2 and Basal-like samples.

Three mis-segmented examples of non-full tumor shapes with INbreast dataset. The red part in the down-left border. (the true positives (TP:yellow), false negatives (FN:red), false positives (FP:green) and true nega- tives (TN:black).)

It is found that three samples that are mis-segmented because they contained two tumors, the one in the center, which is properly segmented, and another that is shown partially in the left-down border of the image, which is wrongly ignored as non-tumor region (FN).



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.