Review — MobileNeXt: Rethinking Bottleneck Structure for Efficient Mobile Network Design
Rethinking Bottleneck Structure for Efficient Mobile Network Design,
MobileNeXt, by National University of Singapore, Yitu Technology, and Institute of Data Science, NUS
2020 ECCV, Over 110 Citations (Sik-Ho Tsang @ Medium)
1989 … 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2] [CSWin Transformer] [Pale Transformer] [Sparse MLP] [MViTv2] [S²-MLP] [CycleMLP] [MobileOne] [GC ViT] [VAN] [ACMix] [CVNets] [MobileViT] [RepMLP] [RepLKNet] [ParNet] 2023 [Vision Permutator (ViP)]
==== My Other Paper Readings Are Also Over Here ====
- The classic residual bottleneck, which uses learning inverted residuals and using linear bottlenecks, bring risks of information loss and gradient confusion.
- In this paper, sandglass block is proposed to flip the bottleneck structure that performs identity mapping and spatial transformation at higher dimensions.
- The Proposed Sandglass Block
- MobileNeXt Model Architecture
1. The Proposed Sandglass Block
1.1. Conceptual Idea
- (a): Spatial convolution is performed in low-dimensional data which may have the risk of information loss.
- (b): Skip connection is performed in low-dimensional data which may have the risk of gradient confusion.
- (c): The proposed sandglass block performs both spatial convolution and skip connection at high-dimensional data.
1.2. Sandglass Block
In details, two pointwise convolutions for channel expansion and reduction are kept in the middle of the residual path for saving parameters and computation cost.
Two depthwise convolutions are placed at the ends of the residual path. Thereby both depthwise convolutions are conducted in high-dimensional spaces, richer feature representations can be extracted.
- There is no activation layer after the reduction layer.
- It is empirically found adding an activation layer after the last convolution can negatively influence the classification performance.
Therefore, activation layers are only added after the first depthwise convolutional layer and the last pointwise convolutional layer.
- To the best of authors’ knowledge, this is the first work that attempts to investigate the advantages of the classic bottleneck structure over the inverted residual block for efficient network design.
2. MobileNeXt Model Architecture
2.1. Overall Architecture
- At the beginning of the network, there is a convolutional layer with 32 output channels. After that, the proposed sandglass blocks are stacked together.
- The expansion ratio used in the network is set to 6 by default.
- The output of the last building block is followed by a global average pooling layer to transform 2D feature maps to 1D feature vectors. A fully-connected layer is finally added to predict the final score for each category.
2.2. Identity Tensor Multiplier
- There is no need to keep the whole identity tensor to combine with the residual path.
- α, from 0 to 1, is introduced as Identity Tensor Multiplier which controls what portion of the channels in the identity tensor is preserved.
- where Φ is the transformation function of the residual path in the block.
2.3. Model Variants
- Five different width multipliers, including 1.4, 1.0, 0.75, 0.5, and 0.35, are used to create 5 model variants.
3.1. Comparison with MobileNetV2
The proposed MobileNeXt with different multipliers all outperform MobileNetV2 with comparable numbers of parameters and computations.
When the parameters and activations are quantized to 8 bits, the network outperforms MobileNetV2 by 3.55% under the same quantization settings.
After adding one more depthwise convolution, the performance of MobileNetV2 increases to 73%, which is still far worse than MobileNeXt (74%) with even more learnable parameters and complexity.
3.2. SOTA Comparisons
- EfficientNet-b0 architecture is used and the inverted residual block is replaced with sandglass block.
With a comparable amount of computation and 20% parameter reduction, replacing the inverted residual block with sandglass block results in 0.4% top-1 classification accuracy improvement on ImageNet-1k dataset.
- When half of the identity representations are removed, the performance has no drop but the latency is improved.
- When the multiplier is set to 1/6, the performance decreases by 0.34%, but with further improvement in terms of latency.
3.3. Object Detection
3.4. NAS Using Sandglass Block for DARTS
- With Sandglass block is added as a new operator for NAS, the above normal cell and reduction cell are searched.
The resulting model achieves higher accuracy than the model with the original DARTS search space with about 25% parameter reduction.
- However, the searched model with the inverted residual block added in the search space decreases the original performance.
This demonstrates that the proposed sandglass block can generate more expressive representations than the inverted residual block and