Brief Review — TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation

Winning Solution (1st out of 735) in Kaggle: Carvana Image Masking Challenge

Sik-Ho Tsang
2 min readSep 10, 2023

TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation
TernausNet
, by Lyft Inc., and Massachusetts Institute of Technology
2018 arXiv v1, Over 650 Citations (Sik-Ho Tsang @ Medium)

Image Segmentation
2014 … 2022
[YOLACT++] 2023 [Segment Anthing Model (SAM)]

  • Classic U-Net is trained from scratch. In this paper, VGG11 pretrained encoder is used as U-Net encoder.
  • This network architecture was a part of the winning solution (1st out of 735) in the Kaggle: Carvana Image Masking Challenge.

Outline

  1. TernausNet
  2. Results

1. TernausNet

U-Net Using VGG11 As Encoder
VGG-11
  • VGG11 is used as U-Net encoder backbone.
  • Three weight initialization schemes are compared: LeCun uniform, the encoder with weights from VGG11 and full network trained on the Carvana dataset.

2. Results

Jaccard index (Intersection Over Union) Over Epochs
  • Jaccard index (Intersection Over Union) is used as evaluation metric.

Validation learning curves in Fig. 3 show benefits of the pretraining approach.

  • After 100 epochs, for validation subset:
  1. LeCun uniform initializer: IoU = 0.593
  2. The Encoder is pre-trained on ImageNet: IoU = 0.686
  3. Fully pre-trained U-Net on Carvana: IoU = 0.687
Visualizations

Pretrained models obtain better segmentation results.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.