Brief Review — An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution

CoordConv, Incorporates the Positions into Conv Layer

An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution, CoordConv, by Uber AI Labs, and Uber Technologies,
2018 NeurIPS, Over 500 Citations (Sik-Ho Tsang @ Medium)
Image Classification

  • CoordConv is proposed, which incorporates the positions into conv layer.


  1. CoordConv
  2. Not-so-Clevr Dataset & Results
  3. Other Results

1. CoordConv

Comparison of 2D convolutional and CoordConv layers
  • A CoordConv layer has 2 to 3 more channels compared with Conv layer.
  • These channels contain hard-coded coordinates, the most basic version of which is one channel for the i coordinate and one for the j coordinate, as shown above.
  • e.g.: for i coordinates, its first row filled with 0’s, its second row with 1’s, its third with 2’s.
  • Other derived coordinates may be input as well, like the radius coordinate used in ImageNet:
  • Finally, scaling is done to make them fall in the range [−1, 1].

2. Not-so-Clevr Dataset & Results

The Not-so-Clevr dataset
  • Not-so-Clevr consists of 9×9 squares placed on a 64×64 canvas.
Toy tasks considered in this paper
  • So, with coordinates as input, CNN should be designed properly to output the correct positions.
Performance of convolution and CoordConv on Supervised Coordinate Classification
  • However, the conventional convolution models never achieve more than about 86% accuracy, and training is slow.
  • CoordConv models learn several hundred times faster, attaining perfect accuracy in seconds.

3. Other Results

3.1. ImageNet Classification

  • As might be expected for tasks requiring straightforward translation invariance, CoordConv does not help significantly when tested with image classification.
  • Adding a single extra 1×1 CoordConv layer with 8 output channels improves ResNet-50 Top-5 accuracy by a meager 0.04% averaged over five runs for each treatment; however, this difference is not statistically significant. It is at least reassuring that CoordConv doesn’t hurt the performance since it can always learn to ignore coordinates.

3.2. Object Detection

  • On a simple problem of detecting MNIST digits scattered on a canvas, it is found the test intersection-over-union (IOU) of a Faster R-CNN network improved by 24% when using CoordConv.
  • (Authors do not have any figures and tables for this part.)
  • With CoordConv, it can be useful for localization problem such as object detection



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store