Brief Review — ParNet: Non-Deep Networks
ParNet, Restricted to 12 Layers Only, Parallel Subnetworks Instead of Stacking One Layer After Another
1989 … 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2] [CSWin Transformer] [Pale Transformer] [Sparse MLP] [MViTv2] [S²-MLP] [CycleMLP] [MobileOne] [GC ViT] [VAN] [ACMix] [CVNets] [MobileViT] [RepMLP] 2023 [Vision Permutator (ViP)]
==== My Other Paper Readings Are Also Over Here ====
- Authors start by asking the question: “Is it possible to build high-performing “non-deep” neural networks?”
- Then, a non-deep network, ParNet (Parallel Networks), is proposed, which uses parallel subnetworks instead of stacking one layer after another.
1.1. (a) ParNet
- ParNet consists of parallel substructures that process features at different resolutions. These parallel substructures are referred as streams. 3 streams are found to be optimal.
- Features from different streams are fused at a later stage in the network, and these fused features are used for the downstream task.
1.2. (b) ParNet Block
- The ParNet block consists of three parallel branches:
- 1×1 convolution, 3×3 convolution and Skip-Squeeze-and-Excitation (SSE).
- Once the training is done, the 1×1 and 3×3 convolutions can be fused together for faster inference. This reparameterization or fusion of blocks helps reduce latency during inference.
- The SSE branch increases receptive field while not affecting depth. A Skip-Squeeze-Excitation (SSE) design is used which is applied alongside the skip connection and uses a single fully-connected layer.
- This modified RepVGG block with the Skip-Squeeze-Excitation module is referred as the RepVGG-SSE.
- The ReLU activation is replaced by SiLU.
1.3. Downsampling and Fusion Blocks
- The Downsampling block reduces resolution and increases width to enable multi-scale processing, while the Fusion block combines information from multiple resolutions.
- Downsampling Block (Left): there is no skip connection. Instead, a single-layered SE module is added parallel to the convolution layer. Additionally, 2D average pooling is added in the 1×1 convolution branch.
- To reduce the parameter count, convolution with 2 groups is used.
- The outputs of Downsampling blocks 2, 3, and 4 are fed respectively to streams 1, 2, and 3.
- Fusion Block (Right): is similar to the Downsampling block but contains an extra concatenation layer.
1.4. Model Variants
- The above ParNet-S, ParNet-M, ParNet-L, and ParNet-XL are used for ImageNet.
- For CIFAR, a more simplified network is used. (Please feel free to read the paper directly.)
ParNet-S outperform ResNet-34 by over 1 percentage point with a lower parameter count (19M vs. 22M).
ParNet also achieves comparable performance to ResNet with the bottleneck design, while having 4 to 8 times less depth.
- For example, ParNet-L performs as well as ResNet-50 and gets a top-1 accuracy of 77.66% as compared to 77.53% achieved by ResNet-50.
- Similarly, ParNet-XL performs comparably to ResNet-101 and gets a top-5 accuracy of 94.13%, in comparison to 94.68% achieved by ResNet-101, while being 8 times shallower.
- ParNet performs favourably to ResNet when comparing accuracy and speed, however with more parameters and FLOPs.
- For example, ParNet-L achieves faster speed and better accuracy than ResNet-34 and ResNet-50. Similarly, ParNet-XL achieves faster speed and better accuracy than ResNet-50, however with more parameters and FLOPs.
- ParNet performs competitively with state-of-the-art deep networks like ResNets and DenseNets while using a much lower depth and a comparable number of parameters.
2.3. MS-COCO Object Detection
- The CSPDarknet53s backbone from YOLOv4-CSP is replaced with ParNet-XL, which is much shallower (64 vs. 12). The head and reduced neck from the YOLOR-D6 model are used.
- The ParNet-XL-CSP model, by applying the CSP, is also tested.
ParNet-XL and ParNet-XL-CSP are faster than the baseline even at higher image resolution. Even on a single GPU, ParNet achieves higher speed than strong baselines.
- (Please feel free to read the paper directly for other experimental results.)