Review: Unsupervised Embedding Learning via Invariant and Spreading Instance Feature
Instance based softmax embedding method, directly optimizes the ‘real’ instance features on top of the softmax function
Unsupervised Embedding Learning via Invariant and Spreading Instance Feature, Ye CVPR’19, by Hong Kong Baptist University, and Columbia University
2019 CVPR, Over 200 Citations (Sik-Ho Tsang @ Medium)
Self-Supervised Learning, Unsupervised Learning, Contrastive Learning, Representation Learning, Image Classification
- A novel instance based softmax embedding method, directly optimizes the ‘real’ instance features on top of the softmax function.
- It achieves significantly faster learning speed and higher accuracy than all the competing methods.
Outline
- Instance-wise Softmax Embedding
- Experimental Results
1. Instance-wise Softmax Embedding
- For each iteration, m instances {x1, x2, x3, …} are randomly sampled.
- For each instance, a random data augmentation operation T() is applied to slightly modify the original image. The augmented sample T(xi) is denoted by ˆxi, and its embedding feature fθ(ˆxi) is denoted by ˆfi.
- The probability of ˆxi being recognized as instance i is defined by:
- The above equation can be rewritten as:
Maximizing exp(fTiˆfi/τ) requires increasing the inner product (cosine similarity) between fi and ˆfi, resulting in a feature that is invariant to data augmentation.
- On the other hand, the probability of xj being recognized as instance i is defined by
- Similarly, the above equation can be rewritten as:
Minimizing exp(fTi fj/τ), aims at separating fj from fi. Thus, it further enhances the spread-out property.
- Correspondingly, the probability of xj not being recognized as instance i is 1−P(i|xj).
- The negative log likelihood is given by:
- Thus, the sum of the negative log likelihood over all the instances within the batch is minimized:
2. Experimental Results
2.1. Training
- The first setting is that the training and testing sets share the same categories (seen testing category). This protocol is widely adopted for general unsupervised feature learning.
- The second setting is that the training and testing sets do not share any common categories (unseen testing category).
2.2. Experiments on Seen Testing Categories
- ResNet-18 is used.
- Feature Embedding dimension is 128.
- The training batch size is set to 128 for all competing methods on both datasets.
- Four kinds of data augmentation methods (RandomResizedCrop, RandomGrayscale, ColorJitter, RandomHorizontalFlip).
- The proposed method achieves the best performance (83.6%) with kNN classifier.
Compared to NPSoftmax [46] and NCE [46] in Instance Discrimination [46], which use memorized feature for optimizing, the proposed method outperform by 2.8% and 3.2% respectively.
The learning speed is much faster than the competitors.
- When only using 5K training images for learning, the proposed method achieves the best accuracy with both classifiers (kNN: 74.1%, Linear: 69.5%).
When 105K images are used, kNN accuracy increases to 81.6% for full 105K training images. The classification accuracy with linear classifier also increases from 69.5% to 77.9%.
2.3. Experiments on Unseen Testing Categories
- The pre-trained GoogLeNet / Inception-v1 on ImageNet is used.
- A 128-dim fully connected layer with ℓ2 normalization is added after the pool5 layer as the feature embedding layer.
- All the input images are firstly resized to 256×256. For data augmentation, the images are randomly cropped at size 227×227 with random horizontal flipping.
- The temperature parameter τ is set to 0.1. The training batch size is set to 64.
- Generally, the instance-wise feature learning methods (NCE [46], Exemplar [8], Proposed) outperform non-instance-wise feature learning methods (DeepCluster [3], MOM [21]), especially on Car196 and Product datasets.
This indicates instance-wise feature learning methods have good generalization ability on unseen testing categories.
- ResNet-18 without pre-training is used.
The proposed method is also a clear winner.
- Green: Correct; Red: Incorrect.
Although there are some wrongly retrieved samples from other categories, most of the top retrieved samples are visually similar to the query.
2.4. Ablation Study
RandomResizedCrop contributes the most.
Without data augmentation (DA), performance drops from 83.6% to 37.4%.
2.5. Understanding of the Learned Embedding
The proposed method performs best to separate positive and negative samples.
The proposed method also performs well to separate other attributes.
This paper and Instance Discrimination have given a prototype for contrastive learning framework. Later on, there are MoCo, PIRL, SimCLR, MoCo v2, and so on.
Reference
[2019 CVPR] [Ye CVPR’19]
Unsupervised Embedding Learning via Invariant and Spreading Instance Feature
Unsupervised/Self-Supervised Learning
1993 [de Sa NIPS’93] 2008–2010 [Stacked Denoising Autoencoders] 2014 [Exemplar-CNN] 2015 [Context Prediction] [Wang ICCV’15] 2016 [Context Encoders] [Colorization] [Jigsaw Puzzles] 2017 [L³-Net] [Split-Brain Auto] [Motion Masks] [Doersch ICCV’17] 2018 [RotNet/Image Rotations] [DeepCluster] [CPC/CPCv1] [Instance Discrimination] 2019 [Ye CVPR’19] 2020 [CMC] [MoCo] [CPCv2] [PIRL] [SimCLR] [MoCo v2]