Reading: IGCV2 — Interleaved Structured Sparse Convolution (Image Classification)

Outperforms Light Weight Model Like IGCNet / IGCV1 and MobileNetV1, Also Outperforms or On Par With Xception, DenseNet, ResNet, WRN, RiR, FractalNet

Sik-Ho Tsang
5 min readJun 11, 2020

In this story, IGCV2 (Interleaved Structured Sparse Convolution), by Sun Yat-Sen University, Guangdong Key Laboratory of Information Security Technology, Microsoft Research, Hefei University of Technology, and University of Central Florida, is briefly presented. In this paper:

  • IGCNet / IGCV1 is improved as IGCV2 with more generalized form.
  • Thus, a lighter model is obtained, the redundancy is further eliminated and the storage and time cost are saved.

This is a paper in 2018 CVPR with over 50 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Motivations from IGCV1 and Xception
  2. IGCV2: Network Architecture
  3. Experimental Results
  4. IGCV3: Network Architecture & Preliminary Results

1. Motivations from IGCV1 and Xception

1.1. IGCV1

Interleaved Group Convolutions in IGCV1
  • In IGCNet / IGCV1, as shown above, it is split into primary group convolutions and secondary group convolutions.
  • And there are permutation before and after secondary group convolutions.
  • Primary group convolutions are the 3×3 spatial convolutions.
  • Secondary group convolutions are the 1×1 spatial convolutions.

The above operations can be further generalized. This process can be repeated more times.

1.2. Xception

Xception Block
  • The Xception block consists of a 1×1 convolution layer followed by a channel-wise convolution layer.

The 1×1 convolution layer can be sparse as well, i.e. group convolutions.

2. IGCV2: Network Architecture

IGCV2: the Interleaved Structured Sparse Convolution
  • W1, W2, W3 (denoted as solid arrows) are sparse block matrices corresponding to group convolutions.
  • P1 and P2 (denoted as dashed arrows) are permutation matrices.
  • The resulting composed kernel W3P2W2P1W1 is ensured to satisfy the complementary condition which guarantees that for each output channel, there exists one and only one path connecting the output channel to each input channel.
  • The bold line connecting gray feature maps shows such a path.
  • Mathematically, IGCV2 is formulated as below:
  • More detailed architecture is as shown below:
detailed architecture
  • x×(3×3, 1) means a 3×3 channel-wise convolution with the channel number being x.
  • L and K are the hyper-parameters of IGCV2. [L-1, x; (1×1, K)] denotes the (L-1) group 1×1 convolutions with each branch containing K channels.
  • For IGCV2 (Cx), L = 3.
  • For IGCV2*(Cx), K = 8, L = RoundUp(logK(x))+1.

3. Experimental Results

3.1. Comparison with Xception

Comparison with Xception on CIFAR-100 and Tiny ImageNet
  • The results over 20-layer networks with various widths are shown above.
  • The channel number at the first convolutional layer in Xception as 35.
  • We can see that the network with fewer number of parameters, performs better than Xception, which shows the powerfulness of IGCV2 block.

3.2. Comparison with IGCV1

Comparison with Xception on CIFAR-100 and Tiny ImageNet
  • IGCV2 outperforms IGCV1 with fewer number of parameters using models with various depths.

3.3. SOTA Comparison

SOTA Comparison on CIFAR-10, CIFAR-100, and Tiny ImageNet
  • e.g.: DenseNet-BC(k = 12) with more number of parameters achieves lower classification error on CIFAR-100 and CIFAR-10 compared with IGCV2* (C416).
  • We can also see that IGCV2 outperforms or on par with many other networks with smaller model size such as FractalNet, ResNet, RiR, ResNet34, WRN.

3.4. Comparison with MobileNet on ImageNet

Left: MobileNetV1, Right: IGCV2
Detailed Architecture
Comparison of MobileNetV1 and IGCV2 on ImageNet classification. 1.0, 0.5, 0.25 are width multipliers.
  • The above result demonstrates that IGCV2 is effective as well on large scale image dataset, outperforms MobileNetV1.

4. IGCV3: Network Architecture & Preliminary Results

Left: MobileNetV2, Right: IGCV3
Detailed Architecture
Comparison of MobileNetV2 and IGCV3 on ImageNet classification. 0.7, 1.0 are width multipliers.
  • IGCV3 is formed by combining the bottleneck with IGCV2.
  • Each 1×1 group convolution contains 2 (g1 = g2 = 8) branches.
  • In the constructed networks, there is a skip connection for each block except the downsampling blocks, and two IGCV3 blocks correspond to one block MobileNetV2.
  • IGCV3 is on par with MobileNetV2 with similar number of parameters.

IGCV3 is published in 2018 BMVC with more explanations and results. Hope I can review it in the coming future!

This is the 13th story in this month!

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.