Brief Review — Differentiable Learning-to-Normalize via Switchable Normalization
Switchable Normalization (SN), Learned Weighted Usage of Instance Norm, Layer Norm, & Batch Norm
Differentiable Learning-to-Normalize via Switchable Normalization,
Switchable Normalization (SN), by The Chinese University of Hong Kong, SenseTime Research, and The University of Hong Kong,
2019 ICLR, Over 170 Citations (Sik-Ho Tsang @ Medium)
Image Classification, Normalization
- Switchable Normalization (SN) is proposed, which learns to select different normalizers for different normalization layers of a deep neural network.
- SN employs three distinct scopes to compute statistics (means and variances) including a channel, a layer, and a minibatch.
Outline
- Switchable Normalization (SN)
- Results
1. Switchable Normalization (SN)
1.1. General Form of Normalization
- Input data of an arbitrary normalization layer represented by a 4D tensor (N, C, H, W).
- Let hncij and^hncij be a pixel before and after normalization, where n∈[1, N], c∈[1, C], i∈[1, H], and j∈[1, W]. Let μ and σ be a mean and a standard deviation. We have:
- where γ and β are a scale and a shift parameter respectively.
Thus, each pixel is normalized by using μ and σ, and then re-scale and re-shift by γ and β. IN, LN, and BN share the formulation, but the numbers of their estimated statistics are different:
- where k∈{in, ln, bn}. Ik is their corresponding set of pixels.
1.2. Switchable Normalization (SN)
- SN has an intuitive expression:
- However, this strategy leads to large redundant computations.
- In fact, the three kinds of statistics of SN depend on each other. Therefore, SN could reduce redundancy by reusing computations:
- Each wk is computed by using a softmax function with λin, λln, and λbn as the control parameters.
2. Results
2.1. ImageNet
SN prefers BN when the minibatch is sufficiently large, while it selects LN instead when small minibatch is presented, as shown in the green and red bars.
SN outperforms BN and GN in almost all cases, rendering its robustness to different batch sizes.
2.2. Others
Reference
[2019 ICLR] [Switchable Normalization (SN)]
Differentiable Learning-to-Normalize via Switchable Normalization