Brief Review — An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution
CoordConv, Incorporates the Positions into Conv Layer
--
An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution, CoordConv, by Uber AI Labs, and Uber Technologies,
2018 NeurIPS, Over 500 Citations (Sik-Ho Tsang @ Medium)
Image Classification
- CoordConv is proposed, which incorporates the positions into conv layer.
Outline
- CoordConv
- Not-so-Clevr Dataset & Results
- Other Results
1. CoordConv
- A CoordConv layer has 2 to 3 more channels compared with Conv layer.
- These channels contain hard-coded coordinates, the most basic version of which is one channel for the i coordinate and one for the j coordinate, as shown above.
- e.g.: for i coordinates, its first row filled with 0’s, its second row with 1’s, its third with 2’s.
- Other derived coordinates may be input as well, like the radius coordinate used in ImageNet:
- Finally, scaling is done to make them fall in the range [−1, 1].
2. Not-so-Clevr Dataset & Results
- Not-so-Clevr consists of 9×9 squares placed on a 64×64 canvas.
- So, with coordinates as input, CNN should be designed properly to output the correct positions.
- However, the conventional convolution models never achieve more than about 86% accuracy, and training is slow.
- CoordConv models learn several hundred times faster, attaining perfect accuracy in seconds.
3. Other Results
3.1. ImageNet Classification
- As might be expected for tasks requiring straightforward translation invariance, CoordConv does not help significantly when tested with image classification.
- Adding a single extra 1×1 CoordConv layer with 8 output channels improves ResNet-50 Top-5 accuracy by a meager 0.04% averaged over five runs for each treatment; however, this difference is not statistically significant. It is at least reassuring that CoordConv doesn’t hurt the performance since it can always learn to ignore coordinates.
3.2. Object Detection
- On a simple problem of detecting MNIST digits scattered on a canvas, it is found the test intersection-over-union (IOU) of a Faster R-CNN network improved by 24% when using CoordConv.
- (Authors do not have any figures and tables for this part.)
- With CoordConv, it can be useful for localization problem such as object detection
Reference
[2018 NeurIPS] [CoordConv]
An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution
Image Classification
1989 … 2018 … [CoordConv] … 2021 [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer] [TResNet] [CPVT] [Twins] 2022 [ConvNeXt] [PVTv2]