Brief Review: Supervised Contrastive Learning
SupCon, Supervised Contrastive Loss, Outperforms Cross Entropy Loss?
Supervised Contrastive Learning
SupCon, by Google Research, Boston University, and MIT
2020 NeurIPS, Over 1200 Citations (Sik-Ho Tsang @ Medium)
Contrastive Learning, Supervised Learning, Image Classification
- Supervised Contrastive Learning (SupCon)
1. Supervised Contrastive Learning (SupCon)
- (a) The cross entropy loss (left): uses labels and a softmax loss to train a classifier;
- (b) The self-supervised contrastive loss (middle): uses a contrastive loss and data augmentations to learn representations, such as SimCLR:
- (It is assumed that contrastive learning is known already.)
- (c) The proposed supervised contrastive loss (right): also learns representations using a contrastive loss, but uses label information to sample positives in addition to augmentations of the same image.
This proposed loss contrasts the set of all samples from the same class as positives against the negatives from the remainder of the batch, using the labels.
- Both contrastive methods (b) and (c) can have an optional second stage which trains a model on top of the learned representations.
As demonstrated by the photo of the black and white puppy, taking class label information into account results in an embedding space where elements of the same class are more closely aligned than in the self-supervised case, even the appearance are not the same. Because they are coming from the same class based on the supervised labels.
- There are two variants Lsupout and Lsupin:
- Lsupout: The summation over positives is located outside of the log.
- Lsupin: The summation is located inside of the log.
- Since the log function is concave function, Jensen’s Inequality suggests that:
- we can see that Lsupout is the upper bound.
- And the results also show that Lsupout outperforms Lsupin. Only Lsupout is used in the experiment.
2.1. Image Classification
SupCon generalizes better than cross-entropy, margin classifiers (with use of labels) and unsupervised contrastive learning techniques on CIFAR-10, CIFAR-100 and ImageNet datasets.
A new state of the art accuracy of 78.7% on ResNet-50 with AutoAugment is achieved.
The SupCon models have lower mCE values across different corruptions. SupCon models demonstrate lesser degradation in accuracy with increasing corruption severity.
2.3. Ablation Study
2.4. Transfer Learning
SupCon is on par with cross-entropy and self-supervised contrastive loss on transfer learning performance when trained on the same architecture.
- Understanding the connection between training objective and transfer performance is left to future work.