# Brief Review — Differentiable Learning-to-Normalize via Switchable Normalization

## Switchable Normalization (SN), Learned Weighted Usage of Instance Norm, Layer Norm, & Batch Norm

Differentiable Learning-to-Normalize via Switchable Normalization,Switchable Normalization (SN), by The Chinese University of Hong Kong, SenseTime Research, and The University of Hong Kong,2019 ICLR, Over 170 Citations(Sik-Ho Tsang @ Medium)

Image Classification, Normalization

**Switchable Normalization (SN)**is proposed, which**learns to select different normalizers for different normalization layers**of a deep neural network.- SN employs three distinct scopes to compute statistics (means and variances) including a channel, a layer, and a minibatch.

# Outline

**Switchable Normalization (SN)****Results**

**1. Switchable Normalization (SN)**

## 1.1. General Form of Normalization

**Input data**of an arbitrary normalization layer represented by**a 4D tensor (**.*N*,*C*,*H*,*W*)- Let
and*hncij***^***hncij***pixel before and after normalization**, where*n*∈[1,*N*],*c*∈[1,*C*],*i*∈[1,*H*], and*j*∈[1,*W*]. Let*μ*be a*σ***mean**and a**standard deviation**. We have:

- where
and*γ**β***scale**and a**shift**parameter respectively.

Thus, each pixel is normalized by using

andμ, and then re-scale and re-shift byσandγ.βIN,LN, andBNshare the formulation, butthe numbers of their estimated statistics are different:

- where
.*k*∈{*in*,*ln*,*bn*}is*Ik***their corresponding set of pixels**.

## 1.2. Switchable Normalization (SN)

- SN has an intuitive expression:

- However, this strategy leads to large redundant computations.
- In fact, the
**three kinds of statistics of SN depend on each other**. Therefore, SN could**reduce redundancy by reusing computations**:

- Each
is computed by using a*wk***softmax**function with*λin*,*λln*,as the control parameters.*λbn*

# 2. Results

## 2.1. ImageNet

SN prefersBNwhen theminibatch is sufficiently large, while it selectsLNinsteadwhen small minibatchis presented, as shown in the green and red bars.

SN outperformsBNandGNin almost all cases, rendering itsrobustness to different batch sizes.

## 2.2. Others

## Reference

[2019 ICLR] [Switchable Normalization (SN)]

Differentiable Learning-to-Normalize via Switchable Normalization