Brief Review — DeeperCluster: Unsupervised Pre-Training of Image Features on Non-Curated Data

DeeperCluster, DeepCluster With RotNet

Sik-Ho Tsang
4 min readSep 1, 2023
DeeperCluster Outperforms RotNet

Unsupervised Pre-Training of Image Features on Non-Curated Datadddde
DeeperCluster
, by Facebook AI Research, and Univ. Grenoble Alpes, Inria
2019 ICCV, Over 260 Citations (Sik-Ho Tsang @ Medium)

Self-Supervised Learning
19932022 [BEiT] [BEiT V2] [Masked Autoencoders (MAE)] [DiT] [SimMIM]
==== My Other Paper Readings Are Also Over Here ====

  • DeeperCluster is proposed, which leverages self-supervision and clustering to capture complementary statistics from large-scale data for self-supervised learning.
  • This is a paper by the authors of DeepCluster.

Outline

  1. Preliminaries
  2. DeeperCluster
  3. Results

1. Preliminaries

1.1. Self-Supervision Signals, e.g.: RotNet

  • A set of N images {x1, …, xN} is given and a pseudo-label yn in Y is assigned to each input xn. In this case, this pseudo-label is the image rotation {0, 90, 180, 270}

Given these pseudo-labels, the parameters of the convet θ are learnt jointly with a linear classifier V to predict pseudo-labels by solving the problem:

  • where l is a loss function. The pseudo-labels yn are fixed during the optimization.

1.2. Clustering-Based Approaches, e.g.: DeepCluster

  • Clustering-based approaches for deep networks typically build target classes by clustering visual features produced by convnets.

We have a latent pseudo-label zn in Z for each image n as well as a corresponding linear classifier W. These clustering-based methods alternate between learning the parameters θ and W, and updating the pseudo-labels zn. Between two reassignments, the pseudo-labels zn are fixed, and the parameters and classifier are optimized by solving:

  • Then, the pseudo-labels zn can be reassigned by minimizing an auxiliary loss function.
  • DeepCluster, where latent targets are obtained by clustering the activations with k-means. More precisely, the targets zn are updated by solving the following optimization problem:
  • where C is the matrix where each column corresponds to a centroid, k is the number of centroids, and zn is a binary vector with a single non-zero entry. This approach assumes that the number of clusters k is known a priori; in practice, we set it by validation on a downstream task. The latent targets are updated every T epochs.

2. DeeperCluster

DeeperCluster

2.1. Combining Self-Supervision and Clustering

  • In this case, the inputs x1, …, xN are rotated images, each associated with a target label yn encoding its rotation angle and a cluster assignment zn.
  • Y is the set of possible rotation angles and Z is the set of possible cluster assignments.

The Cartesian product space Y×Z is used, which can potentially capture richer interactions between the two tasks:

Yet, its complexity is large if the use of a large number of cluster or a self-supervised task is with a large output space.

2.2. Scaling Up to Large Number of Target

The target labels are partitioned into a 2-level hierarchy where we first predict a super-class and then a sub-class among its associated target labels.

The parameters of the linear classifiers are (V, W1, …, WS) and θ are jointly learned by minimizing the following loss function:

  • where l is the negative log-softmax function.
  • We can see that it is a form of multi-task learning.

2.3. Model Architecture

  • VGG-16 with Batch Norm is used, and it is trained on the 96M images from YFCC100M.

3. Results

3.1. PASCAL VOC

PASCAL VOC 2007

The gap with a supervised network is still important when freezing the convolutions (6% for detection and 10% for classification) but drops to less than 5% for both tasks with finetuning.

3.2. ImageNet & Places

ImageNet & Places

DeeperCluster matches the performance of a supervised network for all layers on Places205.

On ImageNet, it also matches supervised features up to the 4th convolutional block.

3.3. Clustering Visualizations

Clustering Visualizations

Some clustering visualizations are shown above.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.