Brief Review: Supervised Contrastive Learning

SupCon, Supervised Contrastive Loss, Outperforms Cross Entropy Loss?

Sik-Ho Tsang
3 min readNov 9, 2022


SupCon loss consistently outperforms cross-entropy with standard data augmentations

Supervised Contrastive Learning
, by Google Research, Boston University, and MIT
2020 NeurIPS, Over 1200 Citations (Sik-Ho Tsang @ Medium)
Contrastive Learning, Supervised Learning, Image Classification


  1. Supervised Contrastive Learning (SupCon)
  2. Results

1. Supervised Contrastive Learning (SupCon)

Cross entropy, self-supervised contrastive loss and supervised contrastive loss
  • (a) The cross entropy loss (left): uses labels and a softmax loss to train a classifier;
  • (b) The self-supervised contrastive loss (middle): uses a contrastive loss and data augmentations to learn representations, such as SimCLR:
  • (It is assumed that contrastive learning is known already.)
  • (c) The proposed supervised contrastive loss (right): also learns representations using a contrastive loss, but uses label information to sample positives in addition to augmentations of the same image.

This proposed loss contrasts the set of all samples from the same class as positives against the negatives from the remainder of the batch, using the labels.

  • Both contrastive methods (b) and (c) can have an optional second stage which trains a model on top of the learned representations.
Supervised vs. self-supervised contrastive losses

As demonstrated by the photo of the black and white puppy, taking class label information into account results in an embedding space where elements of the same class are more closely aligned than in the self-supervised case, even the appearance are not the same. Because they are coming from the same class based on the supervised labels.

  • There are two variants Lsupout and Lsupin:
  • Lsupout: The summation over positives is located outside of the log.
  • Lsupin: The summation is located inside of the log.
  • Since the log function is concave function, Jensen’s Inequality suggests that:
  • we can see that Lsupout is the upper bound.
ImageNet Top-1 classification accuracy for supervised contrastive losses on ResNet-50 for a batch size of 6144
  • And the results also show that Lsupout outperforms Lsupin. Only Lsupout is used in the experiment.

2. Results

2.1. Image Classification

Top-1 classification accuracy on ResNet-50 for various datasets

SupCon generalizes better than cross-entropy, margin classifiers (with use of labels) and unsupervised contrastive learning techniques on CIFAR-10, CIFAR-100 and ImageNet datasets.

Top-1/Top-5 accuracy results on ImageNet for AutoAugment with ResNet-50 and for Stacked RandAugment with ResNet-101 and ResNet-200

A new state of the art accuracy of 78.7% on ResNet-50 with AutoAugment is achieved.

2.2. Robustness

Training with supervised contrastive loss makes models more robust to corruptions in images

The SupCon models have lower mCE values across different corruptions. SupCon models demonstrate lesser degradation in accuracy with increasing corruption severity.

2.3. Ablation Study

Accuracy against (a) Hyperparameters, (b) Batch Size, (c) Epochs, and (d) Temperature

2.4. Transfer Learning

Numbers are mAP for VOC2007; mean-per-class accuracy for Aircraft, Pets, Caltech, and Flowers; and top-1 accuracy for all other datasets.

SupCon is on par with cross-entropy and self-supervised contrastive loss on transfer learning performance when trained on the same architecture.

  • Understanding the connection between training objective and transfer performance is left to future work.


[2020 NeurIPS] [SupCon]
Supervised Contrastive Learning

1.1. Image Classification

19892020 [SupCon] … 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2]

My Other Previous Paper Readings



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.