Review — Deep Learning for Multi-task Medical Image Segmentation in Multiple Modalities

Single CNN for Different Modalities

4 min readNov 15, 2022

Deep Learning for Multi-task Medical Image Segmentation in Multiple Modalities, Moeskops MICCAI’16, by University Medical Center Utrecht, and Eindhoven University of Technology,
2016 MICCAI, Over 300 Citations (Sik-Ho Tsang @ Medium)
Medical Image Analysis, Medical Imaging, Image Segmentation

A single CNN is trained to segment six tissues in MR brain images, the pectoral muscle in MR breast images, and the coronary arteries in cardiac CTA.
(It is noted this is not the kind of multi-task learning for simultaneous classification and segmentation, but for different modalities.)

Outline

Data
Approach
Results

1. Data

1.1. Brain MRI

34 T1-weighted MR brain images from the OASIS project [9] were acquired on a Siemens Vision 1.5 T scanner, , as provided by the MICCAI challenge on multi-atlas labelling [8].
The images were acquired with voxel sizes of 1.0×1.0×1.25mm³ and resampled to isotropic voxel sizes of 1.0×1.0×1.0mm³.
The images were manually segmented, in the coronal plane, into 134 classes, and combined into six commonly used tissue classes: white matter, cortical grey matter, basal ganglia and thalami, ventricular cerebrospinal fluid, cerebellum, and brain stem.

1.2. Breast MRI

34 T1-weighted MR breast images were acquired on a Siemens Magnetom 1.5T scanner with a dedicated double breast array coil [16].
The images were acquired with in-plane voxel sizes between 1.21 and 1.35mm and slice thicknesses between 1.35 and 1.69 mm. All images were resampled to isotropic voxel sizes corresponding to their in-plane voxel size.
The pectoral muscle was manually segmented in the axial plane by contour drawing.

1.3. Cardiac CTA

Ten cardiac CTA scans were acquired on a 256-detector row Philips Brilliance iCT scanner.
The reconstructed images had between 0.4 and 0.5mm in-plane voxel sizes and 0.45/0.90mm slice spacing/thickness. All images were resampled to isotropic 0.45 × 0.45 × 0.45mm³ voxel size.
A human observer traversed the scan in the craniocaudal direction and painted voxels in the main coronary arteries and their branches in the axial plane.

2. Approach

2.1. Input

For each voxel, three orthogonal (axial, sagittal, and coronal) patches of 51×51 voxels centred at the target voxel were extracted. For each of these three patches, features were determined using a deep stack of convolution layers.

2.2. Convolutional Layers

Each convolution layer contained 32 small (3×3 voxels) convolution kernels for a total of 25 convolution layers, similar to VGGNet [14].
No subsampling layers were used.
To reduce number of parameters and overfitting, the same stack of convolutional layers was used for the axial, sagittal and coronal patches.
The output of the convolution layers were 32 features for each of the three orthogonal input patches, hence, 96 features in total.

2.3. Fully Connected Layers

These features were input to two subsequent fully connected layers, each with 192 nodes, then softmax classification layer. This layer contained 2, 3, 7, 8 or 9 output nodes depending on the number of classes. 1×1 voxel convolutions are used for FC layers.

2.4. Other Training Details

**Output classes included in each training experiment.**

Exponential Linear Units (ELUs) [2] were used for all non-linear activation functions. Batch Normalization [5] was used on all layers and Dropout [15] was used on the fully connected layers.
Experiments 1–3: Three networks were trained to perform one task.
Experiments 4–6: Three networks were trained to perform two tasks.
Experiment 7: One network was trained to perform three tasks.
A mini-batch of 210 samples is used
The data for brain MRI, breast MRI and cardiac CTA were split into 14/20, 14/20 and 6/4 training/test images, respectively.
Each network was trained with 25000 mini-batches per task.
No post-processing steps.