Brief Review — An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution
CoordConv, Incorporates the Positions into Conv Layer
An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution, CoordConv, by Uber AI Labs, and Uber Technologies,
2018 NeurIPS, Over 500 Citations (Sik-Ho Tsang @ Medium)
Image Classification
- CoordConv is proposed, which incorporates the positions into conv layer.
Outline
- CoordConv
- Not-so-Clevr Dataset & Results
- Other Results
1. CoordConv
- A CoordConv layer has 2 to 3 more channels compared with Conv layer.
- These channels contain hard-coded coordinates, the most basic version of which is one channel for the i coordinate and one for the j coordinate, as shown above.
- e.g.: for i coordinates, its first row filled with 0’s, its second row with 1’s, its third with 2’s.
- Other derived coordinates may be input as well, like the radius coordinate used in ImageNet:
- Finally, scaling is done to make them fall in the range [−1, 1].
2. Not-so-Clevr Dataset & Results
- Not-so-Clevr consists of 9×9 squares placed on a 64×64 canvas.
- So, with coordinates as input, CNN should be designed properly to output the correct positions.
- However, the conventional convolution models never achieve more than about 86% accuracy, and training is slow.
- CoordConv models learn several hundred times faster, attaining perfect accuracy in seconds.
3. Other Results
3.1. ImageNet Classification
- As might be expected for tasks requiring straightforward translation invariance, CoordConv does not help significantly when tested with image classification.
- Adding a single extra 1×1 CoordConv layer with 8 output channels improves ResNet-50 Top-5 accuracy by a meager 0.04% averaged over five runs for each treatment; however, this difference is not statistically significant. It is at least reassuring that CoordConv doesn’t hurt the performance since it can always learn to ignore coordinates.
3.2. Object Detection
- On a simple problem of detecting MNIST digits scattered on a canvas, it is found the test intersection-over-union (IOU) of a Faster R-CNN network improved by 24% when using CoordConv.
- (Authors do not have any figures and tables for this part.)
- With CoordConv, it can be useful for localization problem such as object detection
Reference
[2018 NeurIPS] [CoordConv]
An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution
Image Classification
1989 … 2018 … [CoordConv] … 2021 [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer] [TResNet] [CPVT] [Twins] 2022 [ConvNeXt] [PVTv2]