Review — Micro-Batch Training with Batch-Channel Normalization and Weight Standardization

Weight Standardization (WS) and Batch-Channel Normalization (BCN) are Proposed

4 min readJun 26, 2022

--

Micro-Batch Training with Batch-Channel Normalization and Weight Standardization,
WS, BCN, by Johns Hopkins University
2020 arXiv v2, Over 40 Citations (Sik-Ho Tsang @ Medium)
Image Classification, Group Normalization (GN), Weight Normalization (WN),

Weight Standardization (WS) standardizes the weights in convolutional layers.
Batch-Channel Normalization (BCN) combines batch and channel normalizations and leverages estimated statistics of the activations in convolutional layers.

Outline

Weight Standardization (WS)
Batch-Channel Normalization (BCN)
Experimental Results

1. Weight Standardization (WS)

Comparing normalization methods on activations (blue) and Weight Standardization (orange)

1.1. WS

Consider a standard convolutional layer with its bias term set to 0:

In Weight Standardization (WS), instead of directly optimizing the loss L on the original weights ^W, the weights ^W are reparameterized as a function of W, i.e. ^W=WS(W).

where:

The loss L is optimized on W by SGD:

Computation graph for WS in feed-forwarding and backpropagation

(.W is the intermediate symbol used in the paper.)

1.2. Comparing WS with WN and CWN

Weight Normalization (WN) is:

Later, Centered WN (CWN) adds a centering operation for WN:

(Please feel free to read WN and CWN for more details if interested.)
To compare with WN and CWN, WS considers the weights for only one of the output channel and reformulate the corresponding weights output as:

And the learnable length g is also removed.

2. Batch-Channel Normalization (BCN)

Batch Normalization is estimated across batch. When batch size is small, BN harms the training.
Batch-Channel Normalization (BCN) is proposed, which can be used for micro-batch training.

Micro-Batch BCN

^μc and ^σc are not updated by the gradients computed from the loss function; instead, they are updated towards more accurate estimates of those statistics (Step 3 and Step 4).

BCN has a channel normalization following the estimate-based normalization. This makes the previously unstable estimate-based normalization stable.
(Some details need to be confirmed by reading the codes.)

3. Experimental Results

3.1. Image Classification

Top-1 Accuracy on ImageNet

GN+WS can be used together to improve the top-1 accuracy on ImageNet.

Error Rate on CIFAR-10 and CIFAR-100

While GN+WS has good performance, BCN+WS is even better.

3.2. Object Detection and Instance Segmentation

Object detection and instance segmentation results on COCO val2017 of Mask R-CNN and FPN with ResNet-50 and ResNet-101 as backbone

Similar trends are observed in Object Detection and Instance Segmentation on MS COCO Val 2017.

Later, another arXiv paper uses WS on BYOL. Please stay tuned.

References

[2020 arXiv v2] [WS, BCN]
Micro-Batch Training with Batch-Channel Normalization and Weight Standardization

[GitHub] https://github.com/joe-siyuan-qiao/WeightStandardization

Image Classification

2020 … [WS, BCN] … 2021 [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer] [TResNet] [CPVT] 2022 [ConvNeXt]

My Other Previous Paper Readings

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Artificial Intelligence

Convolutional Network

Batch Normalization

Image Classification

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech