Review — Striving for Simplicity: The All Convolutional Net
All-CNN: No Max Pooling, No Fully Connected Layers
Striving for Simplicity: The All Convolutional Net
All-CNN, by University of Freiburg
2015 ICLR Workshop, Over 3800 Citations (Sik-Ho Tsang @ Medium)
Convolutional Neural Network, CNN, Image Classification
- CNNs are commonly composed of alternative convolution and max-pooling layers followed by a small number of fully connected layers.
- In this paper, max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy.
- Fully connected layers are replaced by global average pooling layer.
Outline
- Proposed All-CNN
- Experimental Results
1. Proposed All-CNN
1.1. Two Means to Replace Pooling
- There are two means suggested to replace pooling for spatial dimensionality reduction:
- To increase the stride of the convolutional layer that preceded it. However, it significantly reduces the overlap of the convolutional layer, and results in less accurate recognition.
- Or to replace the pooling layer by a normal convolution with stride larger than one.
The second option results in an increase of overall network parameters, yet without loss in accuracy.
1.2. Global Average Pooling to Replace Fully Connected Layer
- This is firstly suggested in NIN.
Using global average pooling to replace fully connected layer helps to reduce large amount of parameters.
1.3. Three Base Models
- Overall, three base models are suggested which consist only of convolutional layers with rectified linear non-linearities and an averaging + softmax layer to produce predictions over the whole image.
1.4. Three Derived Models from Base Models
- Further enhanced models are derived from base models. The derived models for base models A and B are built analogously but not shown in the above table.
5×5 convolutions are replaced by 2 consecutive 3×3 convolutions.
1.5. Detailed Architecture
- The above shows the detailed architecture for CIFAR-10.
- The above shows the detailed architecture for ImageNet.
2. Experimental Results
2.1. Ablation Study
All-CNN-C has the best performance.
2.2. SOTA Comparison
On CIFAR-10, All-CNN is the All-CNN-C. It outperforms Maxout, and NIN, etc. On CIFAR-100, All-CNN-C obtains competitive performance.
2.3. ImageNet
- An upscaled version of the All-CNN-B network is trained, which has 12 convolutional layers.
This network achieves a Top-1 validation error of 41.2% on ILSVRC-2012, when only evaluating on the center 224×224 patch, — which is comparable to the 40.7% Top-1 error reported by AlexNet.
- (There are also sections to visualize the feature map response using deconvolutions which are something similar to ZFNet, please feel free to read the paper.)
Reference
[2015 ICLR Workshop] [All-CNN]
Striving for Simplicity: The All Convolutional Net
Image Classification
1989 … 2015 [All-CNN] … 2021: [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer]