Brief Review — Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels,
ReLabel, by NAVER AI Lab,
2021 CVPR, Over 60 Citations (Sik-Ho Tsang @ Medium)
Image Classification, ImageNet
- ImageNet samples are noisy. Many samples contain multiple classes.
- (Trimps-Soushen, 2016 ImageNet Winner, has found that ImageNet is noisy. In 2019, ImageNet-V2 has generated a new ImageNet test set. In 2020, ImageNet-ReaL has re-assessed the ImageNet labels. On CIFAR dataset, it is also found to have duplicate images by ciFAIR.)
- In this paper, the ImageNet training set is re-label with multi-labels by machine annotator.
1.1. Conceptual Idea
- Original ImageNet annotation is a single label (“ox”), whereas the image contains multiple ImageNet categories (“ox”, “barn”, and “fence”).
- Random crops of an image may contain an entirely different object category from the global annotation.
ReLabel generates location-wise multi-labels, resulting in cleaner supervision per random crop.
- Machine annotators (Some SOTA networks) are trained with single-label supervision on ImageNet, they still tend to make multi-label predictions.
- EfficientNet-L2 is used as the machine annotator that has led to the best performance for ResNet-50 (78.9%).
- The global average pooling layer of the classifier is removed and turn ed the following linear layer into a 1×1 convolutional layer, thereby turning the classifier into a fully-convolutional network.
- The output of the model then becomes f(x) with size of W×H×C. This output f(x) is treated as the label map annotations L with size of W×H×C.
- LabelPooling loads this pre-computed label map and conducts a regional pooling operation on the label map corresponding to the coordinates of the random crop using RoIAlign (from Mask R-CNN) regional pooling approach.
- Global average pooling and softmax operations are performed on the pooled prediction maps to get a multi-label ground-truth vector in [0, 1]C.
- The model is trained based on the multi-label ground-truth vector.
ReLabel is applicable to a wide range of networks with different training recipes.
(There are ablation studies and downstream tasks not yet presented here. Please feel free to read the paper.)
- Large amount of data is important. Clean data is also crucial.