Review — Striving for Simplicity: The All Convolutional Net

All-CNN: No Max Pooling, No Fully Connected Layers

3 min readMay 29, 2022

--

Striving for Simplicity: The All Convolutional Net
All-CNN, by University of Freiburg
2015 ICLR Workshop, Over 3800 Citations (Sik-Ho Tsang @ Medium)
Convolutional Neural Network, CNN, Image Classification

CNNs are commonly composed of alternative convolution and max-pooling layers followed by a small number of fully connected layers.
In this paper, max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy.
Fully connected layers are replaced by global average pooling layer.

Outline

Proposed All-CNN
Experimental Results

1. Proposed All-CNN

1.1. Two Means to Replace Pooling

**Convolution With Stride Larger Than 1 to Reduce Spatial Size More Aggressively** (Figure from here)

There are two means suggested to replace pooling for spatial dimensionality reduction:

To increase the stride of the convolutional layer that preceded it. However, it significantly reduces the overlap of the convolutional layer, and results in less accurate recognition.
Or to replace the pooling layer by a normal convolution with stride larger than one.

The second option results in an increase of overall network parameters, yet without loss in accuracy.

1.2. Global Average Pooling to Replace Fully Connected Layer

**Global Average Pooling in** **NIN** **to Replace Fully Connected Layer** (Image from NIN)

This is firstly suggested in NIN.

Using global average pooling to replace fully connected layer helps to reduce large amount of parameters.

1.3. Three Base Models

**The three base networks used for classification on CIFAR-10 and CIFAR-100**

Overall, three base models are suggested which consist only of convolutional layers with rectified linear non-linearities and an averaging + softmax layer to produce predictions over the whole image.

1.4. Three Derived Models from Base Models

**Model description of the three networks derived from base model C used for evaluating the importance of pooling in case of classification on CIFAR-10 and CIFAR-100**

Further enhanced models are derived from base models. The derived models for base models A and B are built analogously but not shown in the above table.

5×5 convolutions are replaced by 2 consecutive 3×3 convolutions.

1.5. Detailed Architecture

**Architecture of the Large All-CNN network for CIFAR-10**

The above shows the detailed architecture for CIFAR-10.

**Architecture of the ImageNet network**

The above shows the detailed architecture for ImageNet.

2. Experimental Results

2.1. Ablation Study

All-CNN-C has the best performance.

2.2. SOTA Comparison

**Test error on CIFAR-10 and CIFAR-100 for the All-CNN compared to the state of the art from the literature**

On CIFAR-10, All-CNN is the All-CNN-C. It outperforms Maxout, and NIN, etc. On CIFAR-100, All-CNN-C obtains competitive performance.

2.3. ImageNet

An upscaled version of the All-CNN-B network is trained, which has 12 convolutional layers.

This network achieves a Top-1 validation error of 41.2% on ILSVRC-2012, when only evaluating on the center 224×224 patch, — which is comparable to the 40.7% Top-1 error reported by AlexNet.

(There are also sections to visualize the feature map response using deconvolutions which are something similar to ZFNet, please feel free to read the paper.)

Reference

[2015 ICLR Workshop] [All-CNN]
Striving for Simplicity: The All Convolutional Net

Image Classification

1989 … 2015 [All-CNN] … 2021: [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer]

Review — Striving for Simplicity: The All Convolutional Net

All-CNN: No Max Pooling, No Fully Connected Layers

Outline

1. Proposed All-CNN

1.1. Two Means to Replace Pooling

1.2. Global Average Pooling to Replace Fully Connected Layer

1.3. Three Base Models

1.4. Three Derived Models from Base Models

1.5. Detailed Architecture

2. Experimental Results

2.1. Ablation Study

2.2. SOTA Comparison

2.3. ImageNet

Reference

Image Classification

My Other Previous Paper Readings

Written by Sik-Ho Tsang

No responses yet