Brief Review — TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation

Winning Solution (1st out of 735) in Kaggle: Carvana Image Masking Challenge

2 min readSep 10, 2023

TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation
TernausNet, by Lyft Inc., and Massachusetts Institute of Technology
2018 arXiv v1, Over 650 Citations (Sik-Ho Tsang @ Medium)
Image Segmentation
2014 … 2022 [YOLACT++] 2023 [Segment Anthing Model (SAM)]

Classic U-Net is trained from scratch. In this paper, VGG11 pretrained encoder is used as U-Net encoder.
This network architecture was a part of the winning solution (1st out of 735) in the Kaggle: Carvana Image Masking Challenge.

Outline

TernausNet
Results

1. TernausNet

**U-Net** **Using** **VGG11 As Encoder**

VGG11 is used as U-Net encoder backbone.
Three weight initialization schemes are compared: LeCun uniform, the encoder with weights from VGG11 and full network trained on the Carvana dataset.

2. Results

**Jaccard index (Intersection Over Union) Over Epochs**

Jaccard index (Intersection Over Union) is used as evaluation metric.

Validation learning curves in Fig. 3 show benefits of the pretraining approach.

After 100 epochs, for validation subset:

LeCun uniform initializer: IoU = 0.593
The Encoder is pre-trained on ImageNet: IoU = 0.686
Fully pre-trained U-Net on Carvana: IoU = 0.687

Pretrained models obtain better segmentation results.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Deep Learning

Artificial Intelligence

Image Segmentation

Semantic Segmentation

Unet

Written by Sik-Ho Tsang

27K Followers

71 Following

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

Generative AI

Nick Pai

Comparison Between CLIP and BLIP Models

In recent years, vision-language models like CLIP (Contrastive Language-Image Pretraining)¹ and BLIP (Bootstrapped Language-Image…

Nov 1, 2024

U-net Image Segmentation - How to segment persons in images 👤

Eran Feit

U-net Image Segmentation - How to segment persons in images 👤

Summary :

Jan 2

ControlNet, ControlNet++ and Uni-ControlNet

Juneta Tao

ControlNet, ControlNet++ and Uni-ControlNet

ControlNet

Oct 11, 2024

Jo Wang

From Vit to Swin transformer

The Swin Transformer is indeed an evolution of the Vision Transformer (ViT) concept, designed to address some of the limitations of ViT…

Oct 11, 2024

How Does the Segment-Anything Model’s (SAM’s) Encoder Work?

TDS Archive

Wei Yi

How Does the Segment-Anything Model’s (SAM’s) Encoder Work?

a deep dive into how image content embedding, sine and cosine positional embedding, guidance click embedding and dense mask embedding is…

May 14, 2024

291

Brief Review — DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs

Sik-Ho Tsang

Brief Review — DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs

Revitialized DenseNet (RDNet), Surpass Swin Transformer, ConvNeXt, and DeiT-III, Match MogaNet

Oct 15, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech