Review — S⁴L: Self-Supervised Semi-Supervised Learning

S⁴L: Combines Self-Supervised Approach and Semi-Supervised Approach

4 min readMay 19, 2022

S⁴L: Self-Supervised Semi-Supervised Learning
S⁴L, by Google Research, Brain Team
2019 ICCV, Over 400 Citations (Sik-Ho Tsang @ Medium)
Self-Supervised Learning, Semi-Supervised Learning, Image Classification

By unifying self-supervised learning and semi-supervised learning, the framework of self-supervised semi-supervised learning (S⁴L) is proposed.
Mix Of All Models (MOAM) is further proposed by using several techniques together.

Outline

Proposed S⁴L
Experimental Results

1. Proposed S⁴L

**A schematic illustration of one of the proposed self-supervised semi-supervised techniques: S⁴L-Rotation**

1.1. Overall

The learning algorithm has access to a labeled training set Dl, which is sampled i.i.d. from p(X, Y) and an unlabeled training set Du, which is sampled i.i.d. from the marginal distribution p(X).
The minibatch sizes of Dl and Du are chosen as equal size.

The semi-supervised methods have a learning objective:

where Ll is a standard cross-entropy classification loss of all labeled images in the dataset, Lu is a loss defined on unsupervised images.

w=1 is a non-negative scalar weight and θ is the parameters for model.

For self-supervised learning, S⁴L can choose whether to include the minibatch xl into the self-supervised loss, i.e. apply Lself to the union of xu and xl.

So, there are Ll, Lu, and Lself losses.

1.1. S⁴L-Rotation for Self-Supervised Learning

The key idea of Rotation self-supervision (RotNet) is to rotate an input image then predict the rotation degree:

where R is the set of the 4 rotation degrees {0, 90, 180, 270} which results in a 4-class classification problem.

The self-supervised loss is also applied to the labeled images in each minibatch.

1.2. S⁴L-Exemplar for Self-Supervised Learning

The idea of Exemplar self-supervision is to learn a visual representation that is invariant to a wide range of image transformations (“Inception” cropping, random horizontal mirroring, and HSV-space color randomization), produce 8 different instances of each image in a minibatch.
Lu is implemented as the batch hard triplet loss with a soft margin. This encourages transformation of the same image to have similar representations. Lu is applied to all eight instances of each image.

1.3. Semi-Supervised Baselines

Virtual Adversarial Training (VAT), Conditional Entropy Minimization (EntMin), and Pseudo-Label (PL) are considered.
(Please free feel to click to read them if interested.)

2. Experimental Results

2.1. ImageNet

**Top-5 accuracy (%) obtained by individual methods when training them on ILSVRC-2012 with a subset of labels.**

The proposed way of doing self-supervised semi-supervised learning is indeed effective for the two self-supervision methods that are used. It is hypothesized that such approaches can be designed for other self-supervision objectives.

**Comparing our MOAM to previous methods in the literature on ILSVRC-2012 with 10% of the labels**

Mix Of All Models (MOAM): First, S⁴L-Rotation and VAT+EntMin are combined to learn a 4 wider model. Then this model is used to generate Pseudo-Label (PL) for a second training step, followed by a final fine-tuning step.

Step 1) Rotation+VAT+EntMin: In the first step, the model jointly optimizes the S⁴L-Rotation loss and the VAT and EntMin losses.
Step 2) Retraining on Pseudo-Labels (PL): Using the above model, assign pseudo labels to the full dataset and then Step 3) fine-tune the model.
The final model “MOAM (full)” achieves 91.23% top-5 accuracy, which sets the new state-of-the-art, outperforms such as UDA and CPCv2.

Interestingly, MOAM achieves promising results even in the high-data regime with 100% labels, outperforming the fully supervised baseline: +0.87% for top-5 accuracy and +1.6% for top-1 accuracy.

2.2. Place205

**Places205 learning curves of logistic regression on top of the features learned by pre-training**

Self-supervision methods are typically evaluated in terms of how generally useful their learned representation is. This is done by treating the learned model as a fixed feature extractor, and training a linear logistic regression model on top the features it extracts on a different dataset: Place205.
As can be seen, the logistic regression is able to find a good separating hyperplane in very few epochs and then plateaus, whereas in the self-supervised case it struggles for a very long number of epochs.

This indicates that the addition of labeled data leads to much more separable representations, even across datasets.

Reference

[2019 ICCV] [S⁴L]
S⁴L: Self-Supervised Semi-Supervised Learning

Pretraining or Weakly/Semi-Supervised Learning

2004 … 2019 [VAT] [Billion-Scale] [Label Propagation] [Rethinking ImageNet Pre-training] [MixMatch] [SWA & Fast SWA] [S⁴L] 2020 [BiT] [Noisy Student] [SimCLRv2]

Unsupervised/Self-Supervised Learning

1993 … 2019 [Ye CVPR’19] [S⁴L] 2020 [CMC] [MoCo] [CPCv2] [PIRL] [SimCLR] [MoCo v2] [iGPT] [BoWNet] [BYOL] [SimCLRv2] 2021 [MoCo v3] [SimSiam]

Review — S⁴L: Self-Supervised Semi-Supervised Learning

S⁴L: Combines Self-Supervised Approach and Semi-Supervised Approach

Outline

1. Proposed S⁴L

1.1. Overall

1.1. S⁴L-Rotation for Self-Supervised Learning

1.2. S⁴L-Exemplar for Self-Supervised Learning

1.3. Semi-Supervised Baselines

2. Experimental Results

2.1. ImageNet

2.2. Place205

Reference

Pretraining or Weakly/Semi-Supervised Learning

Unsupervised/Self-Supervised Learning

My Other Previous Paper Readings

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sik-Ho Tsang

No responses yet