Brief Review — Joint Segmentation and Fine-Grained Classification of Nuclei in Histopathology Images

U-Net with ResNet-34 as Encoder, VGG Loss is used

Sik-Ho Tsang
5 min readNov 11, 2022


Joint Segmentation and Fine-Grained Classification of Nuclei in Histopathology Images,
Qu ISBI’19, by Rutgers University, and Cancer Institute of New Jersey
2019 ISBI, Over 30 Citations (Sik-Ho Tsang @ Medium)
Medical Image Analysis, Multi-Task Learning, Image Segmentation, Image Classification

  • An unified framework is proposed for Nuclei segmentation and classification, which can segment individual nuclei and classify them into tumor, lymphocyte and stroma nuclei.
  • (Yesterday, I reviewed about Perceptual loss.) In this paper, Perceptual loss is utilized to enhance the segmentation. Transfer learning is used.


  1. Dataset & Preprocessing
  2. Proposed Unified Framework
  3. Results

1. Dataset & Preprocessing

Example of an image and its labels. (a) Original image, (b) Ground-truth label, © Classification label, red, green and blue colors represent tumor, lymphocytes and stroma nuclei, respectively. (d) Segmentation label, distinct colors are different nuclei.
  • A dataset is annotated that consists of 40 H&E stained tissue images from 8 different lung adenocarcinoma or lung squamous cell carcinoma cases, and each case has 5 images of size about 900×900.
  • There are around 24000 annotated nuclei in the dataset and each nucleus is marked as one of the following three types: tumor nucleus, lymphocytes nucleus, stroma (fibroblasts, macrophages, neutrophils, endothelial cells, etc.) nucleus.
  • For each image, one label image is used to encode the segmentation mask and classification class information of each nucleus. In a ground truth label, pixels of value 0 are background. Pixels that have a same positive integer belong to an individual nucleus.
  • The integer value id also indicates the class of the nucleus: (1) tumor nucleus if mod(id, 3) = 0, (2) lymphocyte nucleus if mod(id, 3) = 1, (3) stroma nucleus if mod(id, 3) = 2, where mod is the modular operation.

2. Proposed Unified Framework

Proposed Unified Framework
  • The proposed framework consists of two parts: the prediction network that generates the segmentation mask of each type of nuclei, and the perceptual loss network that computes the perceptual loss between the predicted label and ground-truth label.

2.1. Prediction Network

  • The prediction network is the routine encoder-decoder structure based on U-Net.
  • The encoder is from ResNet-34, without the average pooling and fully connected layers, and is initialized with the pretrained parameters from image classification tasks.
  • There are skip connections between encoder and decoder, which helps to recover high resolution feature maps.
  • The network outputs five probability maps: background, inner part of tumor nuclei, inner part of lymphocytes nuclei, inner part of stroma nuclei and contours of all nuclei.
  • The contour map mainly aims to capture the contours of crowded and touching nuclei. As a result, the predicted inner parts of each nucleus are not connected. The final nuclei mask is generated by a simple morphological dilation operation.

2.2. Perceptual Loss Network

  • The perceptual loss network is utilized to improve the segmentation accuracy of details in the image. It originates from Johnson et al.’s work [17], in which the authors compute loss between high-level features of the transformed image and the original image.
  • The pretrained VGG-16 model is a feature extractor and is fixed during training and test. Four levels of features are extracted using this network for the output of the prediction network and the ground-truth label, i.e., feature maps after the last ReLU layer of the first, second, third and fourth blocks of VGG-16 model, denoted as relu1_2, relu2_2, relu3_3, relu4_3.
  • The mean square loss is then computed between the feature sets of two inputs.

2.3. Loss Function

  • The loss function of the method consists of two parts.
  • The first part is the cross entropy loss for five classes:
  • Larger weights are assigned for low frequent class pixels:
  • where d1, d2 are the distances to the nearest and the second nearest nuclei. σ=5 and ω0=10.
  • The second part is the perceptual loss. Let’s denote the trained VGG-16 model as a function f.
  • where ^y=arg max y is the prediction map obtained from the output probability map y.
  • The final loss function is:
  • where β=0.1.
  • For fine-grained classification, authors only consider the accuracy in true positives instead of all ground-truth nuclei, because not all nuclei have corresponding predicted ones.

2. Results

Some images results of ground-truth labels, FCN-8s, U-Net and the proposed method.
Nuclei segmentation results on the test set
Nuclei fine-grained classification accuracies (%) on the test set.

All three model variants have achieved relatively good segmentation and fine-grained classification results, showing that the idea of combining the two tasks are feasible.

  • Compared to FCN-8s and U-Net, the proposed method has improvements on the segmentation of all types of nuclei, especially on lymphocytes.
  • The proposed method also outperforms FCN-8s and U-Net on the fine-grained classification.

Both transfer learning and perceptual loss techniques can promote the performance of segmentation and classification.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.