Review — SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

SimCLR, Outperforms PIRL, MoCo, CMC, CPCv2, CPC, etc.

SimCLR: A simple framework for contrastive learning of visual representations
  • SimCLR, a Simple framework for Contrastive Learning of visual Representations, is proposed.
  • A recently proposed contrastive self-supervised learning algorithms is simplified, without requiring specialized architectures or a memory bank.
  • Few major components are systematically studied:
  1. Composition of data augmentations plays a crucial role.
  2. A learnable nonlinear transformation between the representation and the contrastive loss substantially improves the representation quality.
  3. Contrastive learning benefits from larger batch sizes and more training steps.
  • This is a paper from Prof. Hinton’s Group.

Outline

  1. SimCLR Framework
  2. SOTA Comparison

1. SimCLR Framework

SimCLR: A simple framework for contrastive learning of visual representations
  • SimCLR learns representations by maximizing agreement between differently augmented views of the same data example via a contrastive loss in the latent space, as shown above.

1.1. Data Augmentation

  • A stochastic data augmentation module that transforms any given data example randomly resulting in two correlated views of the same example, denoted ~xi and ~xj, as positive pairs.
  • Three simple augmentations are applied sequentially: random cropping followed by resize back to the original size, random color distortions, and random Gaussian blur.
Random Crop
  • By randomly cropping images, the contrastive prediction tasks are sampled that include global to local view (B→A) or adjacent view (D→C) prediction.
Illustrations of the studied data augmentation operators
  • The above data augmentation operators are studied.
Linear evaluation (ImageNet top-1 accuracy) under individual or composition of data augmentations, applied only to one branch
  • No single transformation suffices to learn good representations.

1.2. Base Encoder

  • A neural network base encoder f() that extracts representation vectors from augmented data examples.
Linear evaluation of models with varied depth and width

1.3. Projection Head

  • A small neural network projection head g() that maps representations to the space where contrastive loss is applied.
  • A MLP with one hidden layer is used to obtain zi:
  • where σ is a ReLU nonlinearity.
Linear evaluation of representations with different projection heads g() and various dimensions of z = g(h). h has 2048 dimensional

1.4. Contrastive Loss

  • A minibatch of N examples is randomly sampled.
Linear evaluation models (ResNet-50) trained with different batch size and epochs
  • The contrastive prediction task is defined on pairs of augmented examples derived from the minibatch, resulting in 2N data points.
  • Given a positive pair, the other 2(N-1) augmented examples within a minibatch as negative examples.
  • The loss function for a positive pair of examples (i, j) is defined as:
  • where sim(,) is cosine similarity, τ is the temperature parameter.
  • The final loss is computed across all positive pairs, both (i, j) and (j, i), in a mini-batch.
  • (For NCE, please feel free to read NCE, Negative Sampling, CPC.)
  • (For temperature parameter, please feel free to read Distillation.)
Negative loss functions and their gradients.
Linear evaluation (top-1) for models trained with different loss functions. “sh” means using semi-hard negative mining

2. SOTA Comparison

2.1. Linear Evaluation on ImageNet

ImageNet accuracies Against Number of Parameters
ImageNet accuracies of linear classifiers trained on representations

2.2. Few Labels Evaluation on ImageNet

ImageNet accuracy of models trained with few labels

2.3. Transfer Learning

Comparison of transfer learning performance
  • The ResNet-50 (4×) model is used.
  • On the remaining 5 datasets, the models are statistically tied.

--

--

PhD, Researcher. I share what I learn. :) Reads: https://bit.ly/33TDhxG, LinkedIn: https://www.linkedin.com/in/sh-tsang/, Twitter: https://twitter.com/SHTsang3

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store