Brief Review — ParNet: Non-Deep Networks

ParNet, Restricted to 12 Layers Only, Parallel Subnetworks Instead of Stacking One Layer After Another

Sik-Ho Tsang
4 min readMar 17, 2023
Top-1 accuracy on ImageNet vs. depth (in log scale) of various models.

Non-Deep Networks,
ParNet, by Princeton University, and Intel Labs
2021 arXiv v1, Over 20 Citations (Sik-Ho Tsang @ Medium)
Image Classification

Image Classification
1989 … 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2] [CSWin Transformer] [Pale Transformer] [Sparse MLP] [MViTv2] [S²-MLP] [CycleMLP] [MobileOne] [GC ViT] [VAN] [ACMix] [CVNets] [MobileViT] [RepMLP] 2023 [Vision Permutator (ViP)]
==== My Other Paper Readings Are Also Over Here ====

  • Authors start by asking the question: “Is it possible to build high-performing “non-deep” neural networks?
  • Then, a non-deep network, ParNet (Parallel Networks), is proposed, which uses parallel subnetworks instead of stacking one layer after another.

Outline

  1. ParNet
  2. Results

1. ParNet

Schematic representation of ParNet and the ParNet block. ParNet has depth 12 and is composed of parallel substructures.

1.1. (a) ParNet

  • ParNet consists of parallel substructures that process features at different resolutions. These parallel substructures are referred as streams. 3 streams are found to be optimal.
  • Features from different streams are fused at a later stage in the network, and these fused features are used for the downstream task.

1.2. (b) ParNet Block

  • The ParNet block consists of three parallel branches:
  • 1×1 convolution, 3×3 convolution and Skip-Squeeze-and-Excitation (SSE).
  • Once the training is done, the 1×1 and 3×3 convolutions can be fused together for faster inference. This reparameterization or fusion of blocks helps reduce latency during inference.
  • The SSE branch increases receptive field while not affecting depth. A Skip-Squeeze-Excitation (SSE) design is used which is applied alongside the skip connection and uses a single fully-connected layer.
  • This modified RepVGG block with the Skip-Squeeze-Excitation module is referred as the RepVGG-SSE.
  • The ReLU activation is replaced by SiLU.

1.3. Downsampling and Fusion Blocks

Schematic representation of the Fusion (left) and Downsampling (right) blocks used in ParNet.
  • The Downsampling block reduces resolution and increases width to enable multi-scale processing, while the Fusion block combines information from multiple resolutions.
  • Downsampling Block (Left): there is no skip connection. Instead, a single-layered SE module is added parallel to the convolution layer. Additionally, 2D average pooling is added in the 1×1 convolution branch.
  • To reduce the parameter count, convolution with 2 groups is used.
  • The outputs of Downsampling blocks 2, 3, and 4 are fed respectively to streams 1, 2, and 3.
  • Fusion Block (Right): is similar to the Downsampling block but contains an extra concatenation layer.

1.4. Model Variants

Specification of ParNet models used for ImageNet classification: ParNet-S, ParNet-M, ParNet-L, and ParNet-XL.
  • The above ParNet-S, ParNet-M, ParNet-L, and ParNet-XL are used for ImageNet.
  • For CIFAR, a more simplified network is used. (Please feel free to read the paper directly.)

2. Results

2.1. ImageNet

Depth vs. performance on ImageNet.

ParNet-S outperform ResNet-34 by over 1 percentage point with a lower parameter count (19M vs. 22M).
ParNet also achieves comparable performance to ResNet with the bottleneck design, while having 4 to 8 times less depth.

  • For example, ParNet-L performs as well as ResNet-50 and gets a top-1 accuracy of 77.66% as compared to 77.53% achieved by ResNet-50.
  • Similarly, ParNet-XL performs comparably to ResNet-101 and gets a top-5 accuracy of 94.13%, in comparison to 94.68% achieved by ResNet-101, while being 8 times shallower.
Speed and performance of ParNet vs. ResNet.
  • ParNet performs favourably to ResNet when comparing accuracy and speed, however with more parameters and FLOPs.
  • For example, ParNet-L achieves faster speed and better accuracy than ResNet-34 and ResNet-50. Similarly, ParNet-XL achieves faster speed and better accuracy than ResNet-50, however with more parameters and FLOPs.

2.2. CIFAR

Performance of various architectures on CIFAR10 and CIFAR100.
  • ParNet performs competitively with state-of-the-art deep networks like ResNets and DenseNets while using a much lower depth and a comparable number of parameters.

2.3. MS-COCO Object Detection

Non-deep networks can be used as backbones for fast and accurate object detection systems. Speed is measured on a single RTX 3090 using Pytorch 1.8.1 and CUDA 11.1.
  • The CSPDarknet53s backbone from YOLOv4-CSP is replaced with ParNet-XL, which is much shallower (64 vs. 12). The head and reduced neck from the YOLOR-D6 model are used.
  • The ParNet-XL-CSP model, by applying the CSP, is also tested.

ParNet-XL and ParNet-XL-CSP are faster than the baseline even at higher image resolution. Even on a single GPU, ParNet achieves higher speed than strong baselines.

  • (Please feel free to read the paper directly for other experimental results.)

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.