Review: Deep Learning Face Representation by Joint Identification-Verification (DeepID2)

DeepID2 Feature for Face Identification & Face Verification, Outperforms DeepFace

In this story, Deep Learning Face Representation by Joint Identification-Verification, (DeepID2), by The Chinese University of Hong Kong, SenseTime Group, and Chinese Academy of Sciences, is briefly reviewed. In this paper:

  • The face identification task increases the inter-personal variations by drawing DeepID2 features extracted from different identities apart, while the face verification task reduces the intra-personal variations by pulling DeepID2 features extracted from the same identity together.

This is a paper in 2014 NeurIPS with over 2100 citations. (Sik-Ho Tsang @ Medium)


  1. DeepID2 Network Architecture
  2. Experimental Results

1. DeepID2 Network Architecture

The ConvNet structure for DeepID2 feature extraction

1.1. Network

  • The CNN contains four convolutional layers, with local weight sharing in the third and fourth convolutional layers, as shown above.
  • The ConvNet extracts a 160-dimensional DeepID2 feature vector at its last layer (DeepID2 layer) of the feature extraction cascade.
  • The DeepID2 layer to be learned are fully-connected to both the third and fourth convolutional layers.
  • ReLU is used.
  • The input is RGB image of the size 55×47.
  • The DeepID2 feature extraction process is denoted as f.
  • where x is the input and θc is the network parameters.
  • DeepID2 features are learned with two supervisory signals.

1.2. Face Identification Loss

  • The first is face identification signal, which classifies each face image into one of n (e.g., n = 8192) different identities.
  • Identification is achieved by following the DeepID2 layer with an n-way softmax layer, which outputs a probability distribution over the n classes. The network is trained to minimize the cross-entropy loss, which is called the identification loss:
  • where f is the DeepID2 feature vector, t is the target class, and id denotes the softmax layer parameters. pi is the target probability distribution, where pi = 0 for all i except pt = 1 for the target class t.

1.3. Face Verification Loss

  • The second is face verification signal, which encourages DeepID2 features extracted from faces of the same identity to be similar.
  • The verification signal directly regularize DeepID2 features and can effectively reduce the intra-personal variations.
  • Commonly used constraints include the L1/L2 norm and cosine similarity.
  • L2 norm:
  • where fi and fj are DeepID2 feature vectors extracted from the two face images in comparison. yij = 1 means that fi and fj are from the same identity.
  • In this case, it minimizes the L2 distance between the two DeepID2 feature vectors.
  • yij = -1 means different identities.
  • m is the distance margin.
  • L1 norm one is similar to L2 norm as above.
  • The cosine similarity is:
  • where σ is the sigmoid function, and yij is the binary target of whether the two compared face images belong to the same identity.
  • d is the cosine similarity between DeepID2 feature vectors.
The DeepID2 feature learning algorithm
  • The above algorithm shows the learning algorithm.

2. Experimental Results

Accuracy comparison with the previous best results on LFW
ROC comparison with the previous best results on LFW
  • The CelebFaces+ dataset for training, which contains 202,599 face images of 10,177 identities (celebrities) collected from the Internet.
  • The LFW dataset is the de facto standard test set for face verification in unconstrained conditions. It contains 13,233 face images of 5,749 identities collected from the Internet.
  • People in CelebFaces+ and LFW are mutually exclusive.
  • (There are many other results in the paper, please feel free to read the paper if interested.)

As shown in the table and figure above, DeepID2 obtains the best results and improve previous results, e.g.: DeepFace, with a large margin.

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Recent Research in Continual Learning

Review — Augment Your Batch: Improving Generalization Through Instance Repetition

A tutorial on using Google Cloud TPUs

Predict Movie Earnings with Posters

Review: Shake-Shake Regularization (Image Classification)

No games, please — how to reliably rank ratings

Linear regression from scratch in Python.

Batch gradient descent algorithm using Numpy’s einsum

Batch gradient descent with batch size 1

Get the Medium app

Sik-Ho Tsang

Sik-Ho Tsang

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List:

More from Medium

Review — Billion-Scale Semi-Supervised Learning for Image Classification

Deep Fakes

SimCLR, Part 2: The Encoder, Projection Head, and Loss Function