Review — BagNet: Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet (Image Classification)

BagNet: Bag-of-Feature (BoF) Models Using ResNet, Better Interpretability

  • BagNet is proposed, which is based on ResNet-50, by classifying an image based on the occurrences of small local image features.
  • This strategy is closely related to the bag-of-feature (BoF) models where BoF is a very popular technique before the onset of deep learning.
  • Given the simplicity of BoF models, a bit of accuracy is traded for better interpretability, with applications: Diagnosing failure cases, benchmarking diagnostic tools, or serving as interpretable parts of a computer vision pipeline.

Outline

  1. Deep Bag-of-features Model (BagNet)
  2. Experimental Results

1. Deep Bag-of-features Model (BagNet)

Deep Bag-of-features Models (BagNets): Network Architecture & Performance

1.1. Network Architecture

  • (A) BagNet: First, a 2048 dimensional feature representation is inferred from each image patch of size q×q pixels using multiple stacked ResNet blocks and apply a linear classifier to infer the class evidence for each patch (heatmaps). One logit heatmap per class is obtained.
  • These heatmaps are averaged across space and passed through a softmax to get the final class probabilities.
The BagNet architecture is almost equivalent to the ResNet-50 architectures
  • The structure differs from ResNets only in the replacement of many 3×3 by 1×1 convolutions, thereby limiting the receptive field size of the topmost convolutional layer to q×q pixels.
  • The resulting architecture as BagNet-q and test q∈[9, 17, 33].
  • The word linear here refers to the combination of a linear spatial aggregation (a simple average) and a linear classifier on top of the aggregated features.

1.2. Performance

  • (B) Top-5 ImageNet performance: It outperforms AlexNet with such small receptive field for each sub-network though it is still not as good as VGG-16.
  • The constraint on local features makes it straight-forward to analyse how exactly each part of the image influences the classification
  • (c) Correlation with logits of VGG-16: The correlation is higher when the q is larger.

2. Experimental Results

2.1. ImageNet

  • BagNets are directly trained on ImageNet. Surprisingly, patch sizes as small as 17×17 pixels suffice to reach AlexNet performance (80.5% top-5 performance) while patches sizes 33×33 pixels suffice to reach close to 87.6%.
  • Across all receptive field sizes BagNets reach around 155 images/s for BagNets compared to 570 images/s for ResNet-50. The difference in runtime can be attributed to the reduced amount of downsampling in BagNets.

2.2. Explaining Decisions

Heatmaps showing the class evidence extracted from of each part of the image. The spatial sum over the evidence is the total class evidence
  • Clearly, most evidence lies around the shapes of objects (e.g. the crip or the paddles) or certain predictive image features like the glowing borders of the pumpkin.
  • Also, for animals eyes or legs are important. It’s also notable that background features (like the forest in the deer image) are pretty much ignored by the BagNets.
Most informative image patches for BagNets.
  • Top subrow are the patches that caused the highest logit outputs for the given class across all validation images with that label.
  • Bottom subrow are patches with a different label (highlighting errors).

2.3. Explaining Misclassifications

Images misclassified by BagNet-33 and VGG-16 with heatmaps for true and predicted label and the most predictive image patches. Class probability reported for BagNet-33 (left) and VGG (right)
  • The above figure analyse misclassified images by both BagNet-33 and VGG-16. In the first example the ground-truth class “cleaver” was confused with “granny smith” because of the green cucumber at the top of the image.
  • (There are many other analyses in the paper. Please feel free to read the paper.)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store