Brief Review — DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs

Revitialized DenseNet (RDNet), Surpass Swin Transformer, ConvNeXt, and DeiT-III, Match MogaNet

Sik-Ho Tsang
4 min read4 days ago

DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
RDNet
, by NAVER Cloud AI, and NAVER AI Lab
2024 ECCV (Sik-Ho Tsang @ Medium)

Image Classification
1989 … 2023
[Vision Permutator (ViP)] [ConvMixer] [CrossFormer++] [FastViT] [EfficientFormerV2] [MobileViTv2] [ConvNeXt V2] [SwiftFormer] [OpenCLIP] 2024 [FasterViT] [CAS-ViT] [TinySaver]
==== My Other Paper Readings Are Also Over Here ====

  • DenseNets are revisited to have architectural adjustments, block redesign, and improved training recipes towards model widening and boosting memory efficiency.
  • Finally, RDNet is formed and ultimately surpass Swin Transformer, ConvNeXt, and DeiT-III, and match MogaNet.

Outline

  1. RDNet
  2. Results

1. RDNet

1.1. Modern Training Setup

  • (It is better to understand DenseNet first before reading this story.)

1.1.1. Going wider and shallower:

  • The network is widen by augmenting growth rate (GR) while diminishing its depth. Specifically, GR is vastly increased from 32 to 120 here — to achieve it.
  • The number of blocks per stage is adjusted, being reduced from (6, 12, 48, 32) to a much smaller (3, 3, 12, 3) for a depth adjustment.
  • There are around 35% and 18% decreases in training speed and memory, respectively. The marked increase in GFLOPs to 11.1 will be adjusted through the later elements.

1.1.2. Improved feature mixers

  • Layer Normalization (LN) is used instead of Batch Normalization (BN); post-activation is used; depthwise convolution is used; fewer normalizations and activations; and a kernel size of 7 is used.
  • This design improves accuracy by a large margin (+0.9%p) while slightly increasing computational costs.

1.1.3. Larger intermediate channel dimensions:

  • A large input dimension for the depthwise convolution is crucial. The intermediate tensor size within the block is enlarged beyond input dimensions (e.g., Expansion Ratio, ER, was tuned to 6).
  • GR then can be halved, e.g.: from 120 to 60.
  • This achieve both a faster training speed of 21% and 0.4%p improvement in accuracy.

1.1.4. More transition layers

  • A transition layer is used in a stage, not solely after each stage, but after every three blocks with a stride of 1.
  • These transition layers focus on dimension reduction rather than downsampling.
  • This change frequently often improves accuracy.

1.1.5. Patchification stem

  • Using image patches as inputs within a stem. The setup of a patch size 4 with a stride 4 is used.
  • This yields a notable acceleration in computational speed without loss of precision.

1.1.6. Refined transition layers

  • Removing the average pooling and replacing the convolution by adjusting the kernel size and stride (LN replaces BN).

1.1.7. Channel re-scaling

  • Channel re-scaling is required due to the diverse variance of concatenated features.
  • It achieves a slight +0.2%p improvement.

1.2. Revitialized DenseNet (RDNet)

Revitialized DenseNet (RDNet)
  • A family of RDNet is constructed with different sets of GR and Block (B), which as shown above.

2. Results

2.1. ImageNet

ImageNet
ImageNet

RDNets slightly fall behind in accuracy, they significantly make up with speed metrics.

  • For example, RDNet-S can match with other lighter models such as SMT-S or MogaNet-S. Notably, RDNets do not require large memory usage as RDNet aimed but achieve further efficiency.
ImageNet

RDNets surpass competitors by high precision, with decent memory usage and faster speeds.

2.2. Zero-Shot ImageNet

Zero-Shot ImageNet

Follow the training protocol in ConvNeXt-OpenCLIP to train CLIP. RDNet performs better.

2.3. Downstream Tasks

ADE20K

RDNet exhibits strong performance, which reveals the effectiveness on dense prediction tasks.

COCO

RDNet exhibits competitive performance on COCO.

  • (There are still a lot of experiments not yet mentioned, please feel free to read the paper directly.)

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.