Review — Learning a Similarity Metric Discriminatively, with Application to Face Verification
Contrastive Loss + LeNet-Like CNN Siamese Network for Face Recognition
In this story, Learning a Similarity Metric Discriminatively, with Application to Face Verification, by New York University, is briefly reviewed. This is a paper from Prof. LeCun. In this paper:
- Contrastive loss function is used for training the Siamese network for face verification/recognition.
- Specifically, a function is learnt to map input patterns into a target space such that the L1-norm in the target space approximates the “semantic” distance in the input space.
This is a paper in 2005 CVPR with over 3100 citations. (Sik-Ho Tsang @ Medium) Contrastive learning is useful for self-supervised learning. And this is one early paper for contrastive learning.
Outline
- Siamese Network Architecture
- Contrastive Loss Function
- Experimental Results
1. Siamese Network Architecture
- Siamese network is used where the convolutional network is shared at the branches of X1 and X2.
- The input to the system is a pair of images and a label.
- Gw(X) is LeNet-like Convolutional Neural Network (CNN).
2. Contrastive Loss Function
A contrastive learning is to learn a function to make genuine/similar) (blue) pairs closer to each other, and make imposter/dissimilar (grey & orange) pairs repel each other.
- Let Y be a binary label assigned to this pair.
- Y=0 if X1 and X2 are deemed genuine (similar).
- Y=1 if they are deemed imposter (dissimilar).
- where Gw is the CNN and Ew stands for energy function that measures the compatibility between X1 and X2.
- Specifically, the contrastive loss function L(W) is:
- LG is the partial loss function for a pair of genuine points.
- LI is the partial loss function for a pair of imposter points.
- P is the number of training pairs.
- The constant Q is set to the upper bound of EW.
3. Experimental Results
3.1. Datasets
- For the AT&T data, SET1 consisted of 350 images of first 35 subjects and SET2 consisted of 50 images of last 5 subjects.
- This way a total of 3500 genuine and 119000 impostor pairs were generated from SET1 and 500 genuine and 2000 impostor pairs were generated from SET2.
- For the AR/Feret data, SET1 contained all the Feret images and 2,496 images from 96 subjects in the AR database. SET2 contained the 1,040 images from the remaining 40 subjects in the AR database.
- Taking all combinations of 2 images resulted in 71,628 genuine and 11,096,376 impostor pairs.
3.2. Results
- The verification rates obtained from testing the AT&T database and the AR/Purdue database are strikingly different (see the above table and the above 2 figures), underlining the differences in difficulty in the two databases.
- The AT&T dataset is relatively small, and the proposed system required only 5000 training samples to achieve very high performance on the test set.
- The AR/Purdue dataset is very large and diverse, with huge variations in expression, lighting, and added occlusions. The obtained higher error rates reflect this level of difficulty.
I read this paper because I would like to read about the contrastive learning. Later on, Authors also used this technique for dimensionality reduction and it is published in 2006 CVPR. It has a more clear description for the contrastive loss function. I will write a story about it.
Reference
[2005 CVPR] [Chopra CVPR’05]
Learning a Similarity Metric Discriminatively, with Application to Face Verification
Face Recognition
2005 [Chopra CVPR’05]