Review — RegNet: Designing Network Design Spaces
RegNet, Simple & Regular Networks are Designed, By Analyzing the Network Design Space
Designing Network Design Spaces
RegNet, by Facebook AI Research (FAIR)
2020 CVPR, Over 500 Citations (Sik-Ho Tsang @ Medium)
Image Classification, Convolutional Neural Network, CNN, Neural Architecture Search, NAS
- Authors design network design spaces that parametrize populations of networks. By exploring the structure aspect of network design at design space level, authors arrive at a low-dimensional design space, consisting of simple, regular networks that called RegNet.
- RegNet Concept
- The AnyNet Design Space
- The RegNet Design Space
- RegNetX and RegNetY
- Experimental Results
1. RegNet Concept
- Manual design of convolutional blocks may obtain sub-optimal performance.
- NAS require a lot of computations to search for a optimal block.
In this work, a new network design paradigm is presented that combines the advantages of manual design and NAS.
- Authors propose to design network design spaces, where a design space is a parametrized set of possible model architectures, elevated to the population level.
The quality of a design space is characterized by sampling models and inspecting their error distribution.
- For example, in the figure above we start with an initial design space A and apply two refinement steps to yield design spaces B then C. In this case (left):
- The error distributions are strictly improving from A to B to C (right).
The hope is that design principles that apply to model populations are more likely to be robust and generalize.
1.3. Conceptual Procedures
- Started with a relatively unconstrained design space called AnyNet (e.g., widths and depths vary freely across stages), human-in-the-loop methodology is applied to arrive at a low-dimensional design space consisting of simple “regular” networks, called RegNet.
- The core of the RegNet design space is simple: stage widths and depths are determined by a quantized linear function.
Compared to AnyNet, the RegNet design space has simpler models, is easier to interpret, and has a higher concentration of good models.
1.4. Tools for Design Space Design
To obtain a distribution of models, n models are sampled from a design space, and trained.
- For efficiency, a low-compute, low-epoch training regime is used. In particular, in this section 400 million flop (400MF) regime is used and each sampled model is trained for 10 epochs on the ImageNet.
- Each training run is fast: training 100 models at 400MF for 10 epochs is roughly equivalent in flops to training a single ResNet-50 model at 4GF for 100 epochs.
- The design space quality is analyzed by the error empirical distribution function (EDF). The error EDF of n models with errors ei is given by:
- where F(e) gives the fraction of models with error less than e.
- Left: shows the error EDF for n=500 sampled models from the AnyNetX design space.
- Middle & Right: Various network properties versus network error for two examples taken from the AnyNetX design space.
Insights are obtained, the design space is then refined.
2. The AnyNet Design Space
- Given an input image, a network consists of a simple stem, followed by the network body that performs the bulk of the computation, and a final network head that predicts the output classes.
- The network body consists of 4 stages. Each stage consists of a sequence of identical blocks, with the number of blocks di, block width wi, and any other block parameters.
- x block: is the standard residual bottlenecks block with group convolution. AnyNet design space built on it as AnyNetX.
- AnyNetXA: Intitial unconstrained AnyNetX design space.
- AnyNetXB: We first test a shared bottleneck ratio bi=b for all stages i for the AnyNetXA design space.
- AnyNetXC: Starting with AnyNetXB, a shared group width gi=g is used for all stages to obtain AnyNetXC.
The EDFs are nearly unchanged. Overall, AnyNetXC has 6 fewer degrees of freedom than AnyNetXA, and reduces the design space size nearly four orders of magnitude.
- AnyNetXD: A pattern emerges: good network have increasing widths.
- AnyNetXE: The stage depths di likewise tend to increase for the best models.
The constraints on wi and di each reduce the design space by 4!, with a cumulative reduction of O(10⁷) from AnyNetXA.
3. The RegNet Design Space
- A linear parameterization is introduced for block widths, so that different block width uj is generated for each block:
- To quantize uj, an additional parameter wm > 0 is introduced to controls quantization:
- Further, rounding sj is used for wj, to quantized per-block widths wj:
- Left: Models in RegNetX have better average error than AnyNetX while maintaining the best models.
- Middle: Using wm=2 (doubling width between stages) slightly improves the EDF. Setting w0=wa, this performs even better.
Random search efficiency is much higher for RegNetX; searching over just 32 random models is likely to yield good models.
4. RegNetX and RegNetY
- Finally, RegNetX variants are formed as above.
- With Squeeze-and-Excitation (SE), as originated in SENet, RegNetY variants are formed as above.
- Authors also tried many other settings such as inverted bottleneck but not good. Please read paper directly for more details.
5. Experimental Results
- Much of the recent work on network design has focused on the mobile regime (600MF).
RegNet has been compared by many later papers. I’ve already shortened a lot in this story. For more details, please read the paper directly.
[2020 CVPR] [RegNet]
Designing Network Design Spaces
2020 … [RegNet] 2021 [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer] [TResNet] [CPVT] [Twins] [Exemplar-v1, Exemplar-v2] [RepVGG] 2022 [ConvNeXt] [PVTv2]