Brief Review — Improving Cytoarchitectonic Segmentation of Human Brain Areas with Self-supervised Siamese Networks

A Pretext Task, Predicting 3D Distance & Coordinates, for a Downstream Task, Segmentation

5 min readAug 4, 2022

Improving Cytoarchitectonic Segmentation of Human Brain Areas with Self-supervised Siamese Networks, Spitzer MICCAI’18, by Forschungszentrum Jülich, and Heinrich Heine University Düsseldorf
2018 MICCAI, Over 50 Citations (Sik-Ho Tsang @ Medium)
Self-Supervised Learning, Medical Image Segmentation, Image Segmentation

Medical image AI approaches always suffer from the problem of limited amount of expert annotations for training.
A self-supervised auxiliary task is designed, predicting the 3D distance between two patches sampled from the same brain, so that after fine-tuning from these networks, significantly better segmentations are achieved.

Outline

Self-supervised Siamese Network on Auxiliary Distance Task
Segmentation Task Results

1. Self-supervised Siamese Network on Auxiliary Distance Task

1.1. Auxiliary Distance Task

**(a) Sampling locations on pial and inflated left surface (red dots), (b) Example patches for areas hOc1–hOc4lp, (c) Challenging example of hOc2**

(a) Considering a dataset of unlabeled brain sections from one human brain, the self-supervised feature learning task is formulated:
Given two input patches sampled randomly from the cortex in arbitrary sections, learn to predict the geodesic distance along the surface of the brain between these two patches.
(b) Example patches (1019×1019 px) extracted from 2 μm resolution histological sections showing areas hOc1–hOc4lp.
Small variations in the laminar pattern distinguish the areas.
(c) Examples of hOc2. 1) Intra-area variability, 2) artifacts, 3) high curvature, and 4) oblique histological cuts make identification of areas a challenging task.

1.2. Network

**Siamese network architecture for the auxiliary distance task (left) and extended** **U-Net** **architecture for the area segmentation task (right)**

A Siamese network that computes a regression function based on two input patches (𝑥1, 𝑥2). The network consists of two branches with identical CNN architecture and shared weights, computing features (𝑓(𝑥1), 𝑓(𝑥2)). The branch architecture corresponds to the texture filtering branch of the extended U-Net architecture of [10] with a 32-channel dense layer added on top of the last convolutional layer.

1.3. Distance Loss + Coordinate Loss

The predicted distance is defined as the squared Euclidean distance between the feature vectors, and the distance loss as:

where the ground-truth distance 𝑦dist is computed by finding the closest points of the inputs on the brain surface and calculating their shortest distance along this surface.
An additional dense layer 𝑑 calculating the predicted coordinate for each input 𝑥 based on 𝑓(𝑥) and formulate the coordinate loss 𝑙coord as follows:

The total loss with weight decay is:

where α=10 and λ=0.001.

1.4. Training Dataset

The dataset is generated from the BigBrain, a dataset of 7400 consecutive histological cell-body stained sections that were registered to a 3D volume at 20 μm resolution. A surface mesh is available at 200 μm resolution. 200k 1019×1019 px patches are sampled at 2 μm resolution from sections 0–3000.
200k pairs are built in such a way that each patch occurs at least once, pairs always lie on the same hemisphere.

1.5. Target Task: Area Segmentation

For the two input types, the model has two separate downsampling branches that are joined before the upsampling branch. The dataset in [10] is used, comprising 111 cell-body stained sections from 4 different brains, partially annotated with 13 visual areas using the observer-independent method [8].
For training, 2025×2025 px patches with 2 μm resolution were randomly extracted from the dataset.

2. Results

A self-supervised model is trained on 10% of the training set (20k samples) for 50 epochs. The model performs best when combining 𝑙dist with 𝑙coord (rows 2–4 of the table). The inclusion of 𝑙coord doubles the performance on the distance task, showing that 𝑙coord has the expected effect of guiding the model towards a more representative feature embedding.
Compared to the randomly initialized network in [10], the Dice score increases to 0.80, while 𝑒𝑟𝑟seg drops from 21.2 to 14.4.

Thus the combined loss enables the model to better allocate individual samples in the feature embedding and make less errors on the area segmentation task. Training on the full dataset (200k) moderately increases performance on the area segmentation task.

Left: Results on the area segmentation task with partial manual annotations in upper left corner.
Compared to the baseline [10], the proposed method predicts several areas significantly more accurate (circles) and has learned to deal much better with the “other cortex” class (arrows)
Right: Squared Euclidean distances between averaged feature vectors of neighboring image patches, visualized by colored points along the cortical ribbon. Blue values indicate lower distances than yellowish values. Large distances occur at the border between hOc1/hOc2 over consecutive sections (green, enlarged boxes), at regions of high curvature (red) and at oblique regions (blue).

The confusion matrices and example segmentations reveal that the fine-tuned model predicts more areas reliably, and overall exhibits less noise in the segmentation.

Reference

[2018 MICCAI] [Spitzer MICCAI’18]
Improving Cytoarchitectonic Segmentation of Human Brain Areas with Self-supervised Siamese Networks

Self-Supervised Learning

1993 … 2018 … [Spitzer MICCAI’18] … 2020 [CMC] [MoCo] [CPCv2] [PIRL] [SimCLR] [MoCo v2] [iGPT] [BoWNet] [BYOL] [SimCLRv2] [BYOL+GN+WS] [ConVIRT] 2021 [MoCo v3] [SimSiam] [DINO] [Exemplar-v1, Exemplar-v2] [MICLe] [Barlow Twins] [MoCo-CXR]

Biomedical Image Segmentation

2015 … 2018 … [Spitzer MICCAI’18] … 2020: [MultiResUNet] [UNet 3+] [Dense-Gated U-Net (DGNet)]