Review — Striving for Simplicity: The All Convolutional Net

All-CNN: No Max Pooling, No Fully Connected Layers

Sik-Ho Tsang
3 min readMay 29, 2022

Striving for Simplicity: The All Convolutional Net
All-CNN, by University of Freiburg
2015 ICLR Workshop, Over 3800 Citations (Sik-Ho Tsang @ Medium)
Convolutional Neural Network, CNN, Image Classification

  • CNNs are commonly composed of alternative convolution and max-pooling layers followed by a small number of fully connected layers.
  • In this paper, max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy.
  • Fully connected layers are replaced by global average pooling layer.

Outline

  1. Proposed All-CNN
  2. Experimental Results

1. Proposed All-CNN

1.1. Two Means to Replace Pooling

Convolution With Stride Larger Than 1 to Reduce Spatial Size More Aggressively (Figure from here)
  • There are two means suggested to replace pooling for spatial dimensionality reduction:
  1. To increase the stride of the convolutional layer that preceded it. However, it significantly reduces the overlap of the convolutional layer, and results in less accurate recognition.
  2. Or to replace the pooling layer by a normal convolution with stride larger than one.

The second option results in an increase of overall network parameters, yet without loss in accuracy.

1.2. Global Average Pooling to Replace Fully Connected Layer

Global Average Pooling in NIN to Replace Fully Connected Layer (Image from NIN)
  • This is firstly suggested in NIN.

Using global average pooling to replace fully connected layer helps to reduce large amount of parameters.

1.3. Three Base Models

The three base networks used for classification on CIFAR-10 and CIFAR-100
  • Overall, three base models are suggested which consist only of convolutional layers with rectified linear non-linearities and an averaging + softmax layer to produce predictions over the whole image.

1.4. Three Derived Models from Base Models

Model description of the three networks derived from base model C used for evaluating the importance of pooling in case of classification on CIFAR-10 and CIFAR-100
  • Further enhanced models are derived from base models. The derived models for base models A and B are built analogously but not shown in the above table.

5×5 convolutions are replaced by 2 consecutive 3×3 convolutions.

1.5. Detailed Architecture

Architecture of the Large All-CNN network for CIFAR-10
  • The above shows the detailed architecture for CIFAR-10.
Architecture of the ImageNet network
  • The above shows the detailed architecture for ImageNet.

2. Experimental Results

2.1. Ablation Study

All-CNN-C has the best performance.

2.2. SOTA Comparison

Test error on CIFAR-10 and CIFAR-100 for the All-CNN compared to the state of the art from the literature

On CIFAR-10, All-CNN is the All-CNN-C. It outperforms Maxout, and NIN, etc. On CIFAR-100, All-CNN-C obtains competitive performance.

2.3. ImageNet

  • An upscaled version of the All-CNN-B network is trained, which has 12 convolutional layers.

This network achieves a Top-1 validation error of 41.2% on ILSVRC-2012, when only evaluating on the center 224×224 patch, — which is comparable to the 40.7% Top-1 error reported by AlexNet.

  • (There are also sections to visualize the feature map response using deconvolutions which are something similar to ZFNet, please feel free to read the paper.)

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.