Brief Review — Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

ReLabel Noisy ImageNet Dataset

Sik-Ho Tsang
3 min readNov 13, 2022


Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels,
, by NAVER AI Lab,
2021 CVPR, Over 60 Citations (Sik-Ho Tsang @ Medium)
Image Classification, ImageNet

  • ImageNet samples are noisy. Many samples contain multiple classes.
  • (Trimps-Soushen, 2016 ImageNet Winner, has found that ImageNet is noisy. In 2019, ImageNet-V2 has generated a new ImageNet test set. In 2020, ImageNet-ReaL has re-assessed the ImageNet labels. On CIFAR dataset, it is also found to have duplicate images by ciFAIR.)
  • In this paper, the ImageNet training set is re-label with multi-labels by machine annotator.


  1. ReLabel
  2. Results

1. ReLabel

1.1. Conceptual Idea

Re-labeling ImageNet training data.
  • Original ImageNet annotation is a single label (“ox”), whereas the image contains multiple ImageNet categories (“ox”, “barn”, and “fence”).
  • Random crops of an image may contain an entirely different object category from the global annotation.

ReLabel generates location-wise multi-labels, resulting in cleaner supervision per random crop.

1.2. LabelPooling

Illustration of LabelPooling
  • Machine annotators (Some SOTA networks) are trained with single-label supervision on ImageNet, they still tend to make multi-label predictions.
  • EfficientNet-L2 is used as the machine annotator that has led to the best performance for ResNet-50 (78.9%).
  • The global average pooling layer of the classifier is removed and turn ed the following linear layer into a 1×1 convolutional layer, thereby turning the classifier into a fully-convolutional network.
  • The output of the model then becomes f(x) with size of W×H×C. This output f(x) is treated as the label map annotations L with size of W×H×C.
  • LabelPooling loads this pre-computed label map and conducts a regional pooling operation on the label map corresponding to the coordinates of the random crop using RoIAlign (from Mask R-CNN) regional pooling approach.
  • Global average pooling and softmax operations are performed on the pooled prediction maps to get a multi-label ground-truth vector in [0, 1]C.
  • The model is trained based on the multi-label ground-truth vector.

2. Results

ImageNet classification.

ReLabel consistently achieves the best performance over all the metrics, outperforms Label Smoothing in Inception-v3, and Label Cleaning in ImageNet-ReaL

ReLabel on multiple architectures

ReLabel is applicable to a wide range of networks with different training recipes.

(There are ablation studies and downstream tasks not yet presented here. Please feel free to read the paper.)

  • Large amount of data is important. Clean data is also crucial.


[2021 CVPR] [ReLabel]
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

1.1. Image Classification

19892021 [ReLabel] 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2]

My Other Previous Paper Readings



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.