Reading: IGCV3 — Interleaved Low-Rank Group Convolutions (Image Classification)

Outperforms MobileNetV2, MobileNetV1, ShuffleNet V1, NASNet-A, IGCV2, & IGCNet / IGCV1

Sik-Ho Tsang
6 min readJun 13, 2020
Inspiration from (a) and (b) to form (c)

In this story, IGCV3, by University of Science and Technology of China, and Microsoft Research Asia (MSRA), is briefly presented. In this paper:

  • Inspired by the composition of structured sparse kernels, e.g., interleaved group convolutions (IGC), and composition of low-rank kernels, e.g., bottle-neck modules,
  • IGCV3 is designed with the combination of the above two design patterns, using the composition of structured sparse low-rank kernels, to form a convolutional kernel.

This is a paper in 2018 BMVC with over 40 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Related Prior Arts
  2. Interleaved Low-Rank Group Convolutions: IGCV3
  3. Ablation Study
  4. SOTA Comparisons

1. Related Prior Arts

1.1. Interleaved Group Convolution (IGCV1)

IGCV1
  • The IGCV1 block consists of primary and secondary group convolutions, which is mathematically formulated as follows:
  • where P1 and P2 are permutation matrices. The kernel matrices W1 and W2 are block-wise sparse:
  • And the block-wise sparse matrix actually is the Gi group convolutions.

1.2. Interleaved Structured Sparse Convolution (IGCV2)

IGCV2
  • IGCV2 extends IGCV1 by decomposing the convolution matrix into more structured sparse matrices:
  • Here W1 corresponds to a channel-wise spatial convolution, and W2 to WL correspond to group point-wise convolutions.

1.3. MobileNetV1

Depthwise Separable Convolution in MobileNetV1
  • A MobileNetV1 block consists of a channel-wise spatial convolution and a point-wise convolution:
  • where W1 andW2 corresponds to the channel-wise and point-wise convolution respectively.
  • It is an extreme case of IGCV1: both channel-wise and point-wise convolutions are extreme group convolutions.

1.4. MobileNetV2

Inverted Bottleneck in MobileNetV2
  • MobileNetV2 block consists of a dense pointwise convolution, a channelwise spatial convolution, and a dense pointwise convolution.
  • It uses an inverted bottleneck: the first pointwise convolution increases the width and the second one reduces the width.
  • where W1 corresponds to the channel-wise 3×3 convolution, the kernel including K = 9 spatial positions, and W0 and W2 are two low-rank matrices.

2. Interleaved Low-Rank Group Convolutions: IGCV3

Interleaved Low-Rank Group Convolutions: IGCV3
  • The first group convolution is a group 1×1 convolution with G1 = 2 groups.
  • The second is a channel-wise spatial convolution.
  • The third is a group 1×1 convolution with G2 = 2 groups.
  • It consists of a channel-wise spatial convolution, a low-rank group point-wise convolution with G1 groups that reduces the width and a low-rank group point-wise convolution with G2 groups which expands the width back.
  • P1 and P2 are permutation matrices similar to permutation matrices given in IGCV1.
  • W1 corresponds to the channel-wise 3×3 convolution.
  • ˆW0 and W2 are low-rank structured sparse matrices. The two low-rank sparse matrices are mathematically formulated as follows,

3. Ablation Study

3.1. Deeper and wider networks

Accuracy on CIFAR and ImageNet
  • IGCV3 adopts two group convolutions with G1 = 2 and G2 = 2 for the deeper version (IGCV3-D) and with G1 = 4 and G2 = 4 for the wider version (IGCV3-W). For the widest version, it follows the strict complementation condition in IGCV1.
  • IGCV3-D performs the best since:
  • (i) there are redundancies in feature dimensions, so further enlarging the width cannot bring about gains;
  • (ii) the networks built by stacking bottlenecks improve the final performance with the increasing of depth.

3.2. ReLU Positions

Accuracy on CIFAR and ImageNet
  • The second block (IGCV3 block) has obvious advantages over other blocks.
  • (Not much explanation in the paper about the reasons)

3.3. Number of branches in group convolutions

Accuracy on CIFAR and ImageNet
  • It is found that the first group convolution prefers to be denser.
  • The third group convolution projects the high-dimensional features back to the low-dimensional space, which results in information loss. Therefore, reducing more kernels may have little effect on its performance.
  • In the experiments, G1 = 2, G2 = 2 are adopted to reduce the memory cost, which also achieves a good performance.

4. SOTA Comparisons

4.1. Comparisons with IGCV1 and IGCV2

Accuracy on CIFAR and ImageNet
  • IGCV3 outperforms the prior works slightly on CIFAR datasets, and achieves significant improvement about 1.5% on ImageNet.

4.2. Comparisons with Other Mobile Networks

Accuracy on CIFAR and ImageNet
  • “Network s×” means reducing the number of parameters in “Network 1.0×” by s times.
  • IGCV3 outperforms MobileNetV2 a lot with the similar number of parameters.
  • Moreover, IGCV3 with 50% parameters still achieves a better performance, which has the same depth as MobileNetV2. The reason may be that the number of ReLU is half of MobileNetV2.
Accuracy on ImageNet

4.3. COCO Detection

mAP in MS COCO Detection
  • IGCV3 is used as a backbone for detection networks.
  • It follow the original framework in SSDLite [31], but replace all the feature extraction blocks with IGCV3, denoted by “SSDLite2”.
  • IGCV3 is slightly better than MobileNetV2 with fewer parameters, and outperforms YOLOv2 0.6% mAP with much fewer number of parameters.

The above ShuffleNet V1 has been extended as ShuffleNet V2. Also, I haven’t covered SSDLite. Hope I can review them in the coming future.

This is the 16th story in this month!

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.