Brief Review — Differentiable Learning-to-Normalize via Switchable Normalization

Switchable Normalization (SN), Learned Weighted Usage of Instance Norm, Layer Norm, & Batch Norm

4 min readFeb 13, 2023

--

(a) shows that SN adapts to various networks and tasks by learning importance ratios to select normalizers. A ratio is between 0 and 1 and all ratios of each task sum to 1. (b) shows the top-1 accuracies of **ResNet50 trained with SN on ImageNet and compared with** BN and GN in different batch settings.

Differentiable Learning-to-Normalize via Switchable Normalization,
Switchable Normalization (SN), by The Chinese University of Hong Kong, SenseTime Research, and The University of Hong Kong,
2019 ICLR, Over 170 Citations (Sik-Ho Tsang @ Medium)
Image Classification, Normalization

Switchable Normalization (SN) is proposed, which learns to select different normalizers for different normalization layers of a deep neural network.
SN employs three distinct scopes to compute statistics (means and variances) including a channel, a layer, and a minibatch.

Outline

Switchable Normalization (SN)
Results

1. Switchable Normalization (SN)

1.1. General Form of Normalization

**The size of feature maps is N**×C×H×W.

Input data of an arbitrary normalization layer represented by a 4D tensor (N, C, H, W).
Let hncij and^hncij be a pixel before and after normalization, where n∈[1, N], c∈[1, C], i∈[1, H], and j∈[1, W]. Let μ and σ be a mean and a standard deviation. We have:

where γ and β are a scale and a shift parameter respectively.

**Illustration of Different Normalizations (Figure from** GN)

Thus, each pixel is normalized by using μ and σ, and then re-scale and re-shift by γ and β. IN, LN, and BN share the formulation, but the numbers of their estimated statistics are different:

where k∈{in, ln, bn}. Ik is their corresponding set of pixels.

1.2. Switchable Normalization (SN)

SN has an intuitive expression:

However, this strategy leads to large redundant computations.
In fact, the three kinds of statistics of SN depend on each other. Therefore, SN could reduce redundancy by reusing computations:

where the means and variances of LN and BN can be computed based on IN.

Each wk is computed by using a softmax function with λin, λln, and λbn as the control parameters.

**Comparisons of normalization methods.**

2. Results

2.1. ImageNet

**Importance weights v.s. batch sizes. The bracket ( , ) indicates (#GPUs, #samples per GPU). SN doesn’t have** BN in (8, 1).

SN prefers BN when the minibatch is sufficiently large, while it selects LN instead when small minibatch is presented, as shown in the green and red bars.

**Comparisons of top-1 accuracies on the validation set of ImageNet, by using** **ResNet50 trained with SN,** BN, and GN in different batch size settings

SN outperforms BN and GN in almost all cases, rendering its robustness to different batch sizes.

2.2. Others

**COCO. Left:** **Faster R-CNN**+**FPN** **using** **ResNet50. Right:** **Mask R-CNN** **using** **ResNet50 and** **FPN**

Left: When finetuning the SN backbone, SN obtains a significant improvement of 1.1 AP over GN (39.3 v.s. 38.2).
Right: SN improves GN by 0.5 box AP and 0.4 mask AP, when finetuning the same BN+ backbone.

**ADE20K** **validation set and** **Cityscapes** **test set by using** **ResNet50 with dilated convolutions. Right: Results of Kinetics dataset.**

Left: In ADE20K, SN outperforms SyncBN with a large margin in both testing schemes (38.7 v.s. 36.4 and 39.2 v.s. 37.7), and improve GN by 3.0 and 2.9. In Cityscapes, SN also performs best compared to SyncBN and GN.
Right: SN works better than BN and GN in both batch sizes.

Reference

[2019 ICLR] [Switchable Normalization (SN)]
Differentiable Learning-to-Normalize via Switchable Normalization

Brief Review — Differentiable Learning-to-Normalize via Switchable Normalization

Switchable Normalization (SN), Learned Weighted Usage of Instance Norm, Layer Norm, & Batch Norm

Outline

1. Switchable Normalization (SN)

1.1. General Form of Normalization

1.2. Switchable Normalization (SN)

2. Results

2.1. ImageNet

2.2. Others

Reference

Written by Sik-Ho Tsang

No responses yet