Review: Wang APSIPA ASC’19 — CNN-Based Loop Filtering for Versatile Video Coding (Codec Filtering)

Using Squeeze & Excitation and Skip Connection

5 min readMar 13, 2020

**Visualization quality comparison results on the sequence “BasketballPass” of QP 37**

In this story, a CNN-Based Loop Filtering (LF) for Versatile Video Coding (VVC) using skip connection as well as squeeze & excitation, is briefly reviewed, since I am working on video coding research work. This is a paper in 2019 APSIPA ASC. (Sik-Ho Tsang @ Medium)

Outline

Squeeze & Excitation (SE) Basic Block
Network Architecture & Loss Function
Experimental Results

1. Squeeze & Excitation (SE) Basic Block

SE Block in SENet is utilized for building the basic block.

Given a feature map X with shape H×W×C, first, two convolutional layers with Rectified Linear Unit (ReLU) between them:

Each channel is squeezed to a single numeric value using Global Average Pooling (GAP):

Then, a fully connected layer followed by a ReLU function adds the necessary nonlinearity. Its output channel complexity is also reduced by a certain ratio r which is set to be 4 in this paper:

A second fully connected layer followed by a sigmoid activation gives each channel a smooth gating ratio ranged in [0,1]:

Each channel of Y2 is scaled by the gating ratio of Y5:

At last, a skip connection (ResNet) will be added from the input into the output directly to learn the residual.

2. Network Architecture & Loss Function

Two-stage three-branch CNN network is designed.

2.1. First Stage

At the first stage, the U/V components are upsampled to align the matrix’s sizes since the width and height of U/V component have only half size of that of Y component in YUV4:2:0 format.
The QPmap is concatenated at this stage.
QPmap is the feature map which is as the same size of input size, filled with the normalized QP value of the current frame.

2.2. Second Stage

At the second stage,the main pipeline will be split into three branches.
Each branch is for one component and fused by its own CUmap.

**The Y component and its CUmap in sequence “BQSquare” of QP 37**

CUmap is the feature map with the positions of the boundary are filled by 1 and other positions by 0.5 as shown in the above figure.

2.3. Loss Function

Since the improvement of the Y component is more important than that of the U/V component, larger weight is assigned to the Y component.

3. Experimental Results

Anchor: H.266/VVC anchor with DBF, SAO and ALF enabled.
[26]: Filter located between DBF and SAO.
[27]: Only replace DBF and SAO, but ALF is enabled.
[28]: Located between DBF and SAO.
The proposed approach in this paper is used with DBF, SAO and ALF all disabled. It can obtain 6.46%, 10.40%, 12.79% BD-rate reduction on luma and two chroma components, respectively, which is better than [26–28] which are proposals submitted to the VVC standard.