Brief Review — Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

ReLabel Noisy ImageNet Dataset

3 min readNov 13, 2022

Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels,
ReLabel, by NAVER AI Lab,
2021 CVPR, Over 60 Citations (Sik-Ho Tsang @ Medium)
Image Classification, ImageNet

ImageNet samples are noisy. Many samples contain multiple classes.
(Trimps-Soushen, 2016 ImageNet Winner, has found that ImageNet is noisy. In 2019, ImageNet-V2 has generated a new ImageNet test set. In 2020, ImageNet-ReaL has re-assessed the ImageNet labels. On CIFAR dataset, it is also found to have duplicate images by ciFAIR.)
In this paper, the ImageNet training set is re-label with multi-labels by machine annotator.

Outline

ReLabel
Results

1. ReLabel

1.1. Conceptual Idea

Original ImageNet annotation is a single label (“ox”), whereas the image contains multiple ImageNet categories (“ox”, “barn”, and “fence”).
Random crops of an image may contain an entirely different object category from the global annotation.

ReLabel generates location-wise multi-labels, resulting in cleaner supervision per random crop.

1.2. LabelPooling

Machine annotators (Some SOTA networks) are trained with single-label supervision on ImageNet, they still tend to make multi-label predictions.
EfficientNet-L2 is used as the machine annotator that has led to the best performance for ResNet-50 (78.9%).
The global average pooling layer of the classifier is removed and turn ed the following linear layer into a 1×1 convolutional layer, thereby turning the classifier into a fully-convolutional network.
The output of the model then becomes f(x) with size of W×H×C. This output f(x) is treated as the label map annotations L with size of W×H×C.
LabelPooling loads this pre-computed label map and conducts a regional pooling operation on the label map corresponding to the coordinates of the random crop using RoIAlign (from Mask R-CNN) regional pooling approach.
Global average pooling and softmax operations are performed on the pooled prediction maps to get a multi-label ground-truth vector in [0, 1]C.
The model is trained based on the multi-label ground-truth vector.

2. Results

ReLabel consistently achieves the best performance over all the metrics, outperforms Label Smoothing in Inception-v3, and Label Cleaning in ImageNet-ReaL

ReLabel is applicable to a wide range of networks with different training recipes.

(There are ablation studies and downstream tasks not yet presented here. Please feel free to read the paper.)

Large amount of data is important. Clean data is also crucial.

Reference

[2021 CVPR] [ReLabel]
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

1.1. Image Classification

1989 … 2021 [ReLabel] … 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2]