Brief Review — Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
ReLabel Noisy ImageNet Dataset
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels,
ReLabel, by NAVER AI Lab,
2021 CVPR, Over 60 Citations (Sik-Ho Tsang @ Medium)
Image Classification, ImageNet
- ImageNet samples are noisy. Many samples contain multiple classes.
- (Trimps-Soushen, 2016 ImageNet Winner, has found that ImageNet is noisy. In 2019, ImageNet-V2 has generated a new ImageNet test set. In 2020, ImageNet-ReaL has re-assessed the ImageNet labels. On CIFAR dataset, it is also found to have duplicate images by ciFAIR.)
- In this paper, the ImageNet training set is re-label with multi-labels by machine annotator.
Outline
- ReLabel
- Results
1. ReLabel
1.1. Conceptual Idea
- Original ImageNet annotation is a single label (“ox”), whereas the image contains multiple ImageNet categories (“ox”, “barn”, and “fence”).
- Random crops of an image may contain an entirely different object category from the global annotation.
ReLabel generates location-wise multi-labels, resulting in cleaner supervision per random crop.
1.2. LabelPooling
- Machine annotators (Some SOTA networks) are trained with single-label supervision on ImageNet, they still tend to make multi-label predictions.
- EfficientNet-L2 is used as the machine annotator that has led to the best performance for ResNet-50 (78.9%).
- The global average pooling layer of the classifier is removed and turn ed the following linear layer into a 1×1 convolutional layer, thereby turning the classifier into a fully-convolutional network.
- The output of the model then becomes f(x) with size of W×H×C. This output f(x) is treated as the label map annotations L with size of W×H×C.
- LabelPooling loads this pre-computed label map and conducts a regional pooling operation on the label map corresponding to the coordinates of the random crop using RoIAlign (from Mask R-CNN) regional pooling approach.
- Global average pooling and softmax operations are performed on the pooled prediction maps to get a multi-label ground-truth vector in [0, 1]C.
- The model is trained based on the multi-label ground-truth vector.
2. Results
ReLabel consistently achieves the best performance over all the metrics, outperforms Label Smoothing in Inception-v3, and Label Cleaning in ImageNet-ReaL
ReLabel is applicable to a wide range of networks with different training recipes.
(There are ablation studies and downstream tasks not yet presented here. Please feel free to read the paper.)
- Large amount of data is important. Clean data is also crucial.
Reference
[2021 CVPR] [ReLabel]
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
1.1. Image Classification
1989 … 2021 [ReLabel] … 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2]