# Reading: Deep Roots — Improving CNN Efficiency with Hierarchical Filter Groups (Image Classification)

## Similar or Higher Accuracy Than the Baseline Architectures Such as NIN, ResNet & GoogLeNet, with Much Less Computation & Smaller Model Size

In this story, **Deep Roots**, by University of Cambridge, and Microsoft Research, is briefly presented.

For a convolution, it is unlikely that every filter (or neuron) in a deep neural network needs to depend on the output of all the filters in the previous layer. In fact, reducing filter co-dependence in deep networks has been shown to benefit generalization.

In this paper:

**By using hierarchical filter groups, much smaller model and less computation is obtained.****Various architectures are validated**by evaluating on the CIFAR10 and ILSVRC datasets.

This is a paper in **2017 CVPR** with more than **100 citations**. (Sik-Ho Tsang @ Medium)

# 1. Convolution with Filter Groups **in **AlexNet

- In AlexNet, ‘filter groups’ in the convolutional layers of a CNN is used while their use of filter groups was necessitated by the practical need to sub-divide the work of training a large network across multiple GPUs.
- The side effect is surprising that
**the****AlexNet****network has approximately 57% fewer connection weights.** - Despite the large difference in the number of parameters between the models, both achieve comparable accuracy on ILSVRC — in fact the smaller grouped network gets 1% lower top-5 validation error.

# 2. **Root Module: Architecture**

- The filter groups as shown in (b) and (c) are used to
**force the network to learn filters with only limited dependence on previous layers.** **This reduced connectivity also reduces computational complexity and model size**since the size of filters in filter groups are reduced drastically.**A root module**: has a given number of filter groups,**the more filter groups, the fewer the number of connections**to the previous layer’s outputs. Each spatial convolutional layer is**followed by a low-dimensional embedding (1×1 convolution).**

**NIN****(Orig)**: It is composed of 3 spatial (5×5, 3×3) convolutional layers with a large number of filters (192).- The original number of filters per layer is preserved but subdivided them into groups.

## 3.2. CIFAR-10

- Compared to the baseline architecture,
**the root variants achieve a significant reduction in computation and model size without a significant reduction in accuracy.** - For example, the
**root-8**architecture gives**equivalent accuracy**with**only 46% of the floating point operations (FLOPS)**,**33% of the model parameters**of the original network, and**approximately 37% and 23% faster CPU and GPU timings**.

- The inter-layer correlation between the adjacent filter layers conv2c and conv3a in the network is shown above.
- The block-diagonalization enforced by the filter group structure is visible, more so with larger number of filter groups.
**This shows that the network learns an organization of filters such that the sparsely distributed strong filter relations.**

## 3.3. Grouping Degree with Network Depth

- We might consider having the degree of grouping:
**(1) decrease with depth**after the first convolutional layer, e.g. 1–8–4 (‘**root**’);**(2) remain constant with depth**after the first convolutional layer, e.g. 1–4–4 (‘**column**’);- or
**(3) increase with depth**, e.g. 1–4–8 (‘**tree**’). - The results show that the so-called
**root topology gives the best performance**, providing the smallest reduction in accuracy for a given reduction in model size and computational complexity.

# 4. **Root Module in **ResNet

## 4.1. ResNet Variant

- ResNet-50 has 50 convolutional layers, of which one-third are spatial convolutions (non-1×1).
- The spatial convolutional layers of the original network are replaced with root modules.

## 4.2. ResNet-50 on ILSVRC

- Similar results as the NIN one.
- For example,
**the best result by accuracy(root-16), exceeds the baseline accuracy by 0.2%**while**reducing the model size by 27% and floating-point operations (multiplyadd) by 37%.****CPU timings were 23% faster**, while**GPU timings were 13% faster**. - With a drop in accuracy of only 0.1% however, the root-64 model reduces the model size by 40%, and reduces the floating point operations by 45%. CPU timings were 31% faster, while GPU timings were 12% faster.

## 4.3. ResNet-200 on ILSVRC

- The models trained with roots have comparable or lower error, with fewer parameters and less computation.
**The root-64 model has 27% fewer FLOPS and 48% fewer parameters than****ResNet****-200.**

**5. Root Module in **GoogLeNet

## 5.1. GoogLeNet Variant

- For all of the networks, grouped filters within each of the ‘spatial’ convolutions (3×3, 5×5) are applied.

## 5.2. GoogLeNet on ILSVRC

**For many of the configurations the top-5 accuracy remains within 0.5% of the baseline model.**- The highest accuracy result, is 0.1% off the top-5 accuracy of the baseline model, but has a 0.1% higher top-1 accuracy.
- While maintaining the same accuracy, this network has 9% faster CPU and GPU timings.
- However, a model with only 0.3% lower top-5 accuracy than the baseline has much higher gains in computational efficiency — 44% fewer floating point operations (multiplyadd), 7% fewer model parameters, 21% faster CPU and 16% faster GPU timings.

It has been a long time not reading a CVPR paper about image classification.

This is the 7th story in this month!

## Reference

[2017 CVPR] [Deep Roots]

Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups

## Image Classification

[LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [Deep Roots] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2] [CondenseNet]