Review: Deep Learning Face Representation by Joint Identification-Verification (DeepID2)
DeepID2 Feature for Face Identification & Face Verification, Outperforms DeepFace
In this story, Deep Learning Face Representation by Joint Identification-Verification, (DeepID2), by The Chinese University of Hong Kong, SenseTime Group, and Chinese Academy of Sciences, is briefly reviewed. In this paper:
- The face identification task increases the inter-personal variations by drawing DeepID2 features extracted from different identities apart, while the face verification task reduces the intra-personal variations by pulling DeepID2 features extracted from the same identity together.
This is a paper in 2014 NeurIPS with over 2100 citations. (Sik-Ho Tsang @ Medium)
- DeepID2 Network Architecture
- Experimental Results
1. DeepID2 Network Architecture
- The CNN contains four convolutional layers, with local weight sharing in the third and fourth convolutional layers, as shown above.
- The ConvNet extracts a 160-dimensional DeepID2 feature vector at its last layer (DeepID2 layer) of the feature extraction cascade.
- The DeepID2 layer to be learned are fully-connected to both the third and fourth convolutional layers.
- ReLU is used.
- The input is RGB image of the size 55×47.
- The DeepID2 feature extraction process is denoted as f.
- where x is the input and θc is the network parameters.
- DeepID2 features are learned with two supervisory signals.
1.2. Face Identification Loss
- The first is face identification signal, which classifies each face image into one of n (e.g., n = 8192) different identities.
- Identification is achieved by following the DeepID2 layer with an n-way softmax layer, which outputs a probability distribution over the n classes. The network is trained to minimize the cross-entropy loss, which is called the identification loss:
- where f is the DeepID2 feature vector, t is the target class, and id denotes the softmax layer parameters. pi is the target probability distribution, where pi = 0 for all i except pt = 1 for the target class t.
1.3. Face Verification Loss
- The second is face verification signal, which encourages DeepID2 features extracted from faces of the same identity to be similar.
- The verification signal directly regularize DeepID2 features and can effectively reduce the intra-personal variations.
- Commonly used constraints include the L1/L2 norm and cosine similarity.
- L2 norm:
- where fi and fj are DeepID2 feature vectors extracted from the two face images in comparison. yij = 1 means that fi and fj are from the same identity.
- In this case, it minimizes the L2 distance between the two DeepID2 feature vectors.
- yij = -1 means different identities.
- m is the distance margin.
- L1 norm one is similar to L2 norm as above.
- The cosine similarity is:
- where σ is the sigmoid function, and yij is the binary target of whether the two compared face images belong to the same identity.
- d is the cosine similarity between DeepID2 feature vectors.
- The above algorithm shows the learning algorithm.
2. Experimental Results
- The CelebFaces+ dataset for training, which contains 202,599 face images of 10,177 identities (celebrities) collected from the Internet.
- The LFW dataset is the de facto standard test set for face verification in unconstrained conditions. It contains 13,233 face images of 5,749 identities collected from the Internet.
- People in CelebFaces+ and LFW are mutually exclusive.
- (There are many other results in the paper, please feel free to read the paper if interested.)
As shown in the table and figure above, DeepID2 obtains the best results and improve previous results, e.g.: DeepFace, with a large margin.