Review — Deep Learning for Multi-task Medical Image Segmentation in Multiple Modalities

Single CNN for Different Modalities

Sik-Ho Tsang
4 min readNov 15, 2022


Deep Learning for Multi-task Medical Image Segmentation in Multiple Modalities, Moeskops MICCAI’16, by University Medical Center Utrecht, and Eindhoven University of Technology,
2016 MICCAI, Over 300 Citations (Sik-Ho Tsang @ Medium)
Medical Image Analysis, Medical Imaging, Image Segmentation

  • A single CNN is trained to segment six tissues in MR brain images, the pectoral muscle in MR breast images, and the coronary arteries in cardiac CTA.
  • (It is noted this is not the kind of multi-task learning for simultaneous classification and segmentation, but for different modalities.)


  1. Data
  2. Approach
  3. Results

1. Data

1.1. Brain MRI

  • 34 T1-weighted MR brain images from the OASIS project [9] were acquired on a Siemens Vision 1.5 T scanner, , as provided by the MICCAI challenge on multi-atlas labelling [8].
  • The images were acquired with voxel sizes of 1.0×1.0×1.25mm³ and resampled to isotropic voxel sizes of 1.0×1.0×1.0mm³.
  • The images were manually segmented, in the coronal plane, into 134 classes, and combined into six commonly used tissue classes: white matter, cortical grey matter, basal ganglia and thalami, ventricular cerebrospinal fluid, cerebellum, and brain stem.

1.2. Breast MRI

  • 34 T1-weighted MR breast images were acquired on a Siemens Magnetom 1.5T scanner with a dedicated double breast array coil [16].
  • The images were acquired with in-plane voxel sizes between 1.21 and 1.35mm and slice thicknesses between 1.35 and 1.69 mm. All images were resampled to isotropic voxel sizes corresponding to their in-plane voxel size.
  • The pectoral muscle was manually segmented in the axial plane by contour drawing.

1.3. Cardiac CTA

  • Ten cardiac CTA scans were acquired on a 256-detector row Philips Brilliance iCT scanner.
  • The reconstructed images had between 0.4 and 0.5mm in-plane voxel sizes and 0.45/0.90mm slice spacing/thickness. All images were resampled to isotropic 0.45 × 0.45 × 0.45mm³ voxel size.
  • A human observer traversed the scan in the craniocaudal direction and painted voxels in the main coronary arteries and their branches in the axial plane.

2. Approach

2.1. Input

  • For each voxel, three orthogonal (axial, sagittal, and coronal) patches of 51×51 voxels centred at the target voxel were extracted. For each of these three patches, features were determined using a deep stack of convolution layers.

2.2. Convolutional Layers

Model Architecture
  • Each convolution layer contained 32 small (3×3 voxels) convolution kernels for a total of 25 convolution layers, similar to VGGNet [14].
  • No subsampling layers were used.
  • To reduce number of parameters and overfitting, the same stack of convolutional layers was used for the axial, sagittal and coronal patches.
  • The output of the convolution layers were 32 features for each of the three orthogonal input patches, hence, 96 features in total.

2.3. Fully Connected Layers

  • These features were input to two subsequent fully connected layers, each with 192 nodes, then softmax classification layer. This layer contained 2, 3, 7, 8 or 9 output nodes depending on the number of classes. 1×1 voxel convolutions are used for FC layers.

2.4. Other Training Details

Output classes included in each training experiment.
  • Exponential Linear Units (ELUs) [2] were used for all non-linear activation functions. Batch Normalization [5] was used on all layers and Dropout [15] was used on the fully connected layers.
  • Experiments 1–3: Three networks were trained to perform one task.
  • Experiments 4–6: Three networks were trained to perform two tasks.
  • Experiment 7: One network was trained to perform three tasks.
  • A mini-batch of 210 samples is used
  • The data for brain MRI, breast MRI and cardiac CTA were split into 14/20, 14/20 and 6/4 training/test images, respectively.
  • Each network was trained with 25000 mini-batches per task.
  • No post-processing steps.

3. Results

Learning curves showing Dice coefficients for tissue segmentation in brain MRI (top three rows), breast MRI (bottom left ), and cardiac CTA (bottom right), reported at 1000 mini-batch intervals for experiments including that task.
  • As the networks learned, the obtained Dice coefficients increased and the stability of the results improved.

For each segmentation task, the learning curves were similar for all experiments.

Example segmentations for (top to bottom) brain MRI, breast MRI, and cardiac CTA.

For all three tasks, all four networks were able to accurately segment the target tissues.

CNN Backbone is shared among modalities.


[2016 MICCAI] [Moeskops MICCAI’16]
Deep Learning for Multi-task Medical Image Segmentation in Multiple Modalities

4.2. Biomedical Image Segmentation

2015 … 2016 [Moeskops MICCAI’16] … 2021 [Ciga JMEDIA’21]

My Other Previous Paper Readings



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.