Reading: AResNet — Attention Residual Neural Network (Codec Filtering)

2.77%, 7.01%, 8.64% BD-Rate Reduction for Luma & Both Chroma With RA Configuration for JEM

4 min readJun 16, 2020

**The Proposed AResNet used as** **CNNF** **(CNN Filter)**

In this story, Attention Residual Neural Network (AResNet), by Hikvision Research Institute, is briefly presented. I read this because I work on video coding research. In this paper:

In-loop filter for inter frames is proposed to completely replace all the conventional filters in the codec.
Moreover, the greedy heuristic approach is used during training.

This is a paper in 2019 ICME. (Sik-Ho Tsang @ Medium)

Outline

Attention Residual Neural Network (AResNet)
Greedy Heuristic Approach
Some Training Details
Experimental Results

1. Attention Residual Neural Network (AResNet)

Based on CNNF, an attention net is added, to control filter strength based on the input content and QP map.
The attention residual network with 7 blocks is taken.
Each block module contains a convolutional layer followed by a BN and a ReLU as the non-linear activation.
The kernel number of each block is set to 64, the kernel size of each block is set to 3×3, except the first one, which is set to 5×5 to expand the field of the perception.
For the last three convolutional layers, the kernel numbers are set to 256, 256 and 1 separately, which means KL equals to 256, KL+1 equals to 1.
For I frames, CNNF is used.

2. Greedy Heuristic Approach

To train the network, instead of standard L2 norm, authors use:

where the x represents the original content without the compression, the ˆx represents the reconstructed content filtered by the loop filers.
~x represents the reconstructions filtered by the loop filter generated by the last greedy training step, which is mentioned later.

As shown the algorithm above.
For the first training (Steps 06 to 11): All the conventional filters of JEM 7.0 are enabled to get the reconstructed data before and after filters as training data pair. Thus, a faster convergence is achieved.
For the next training (Steps 05 to 14): Then, disable all the filters off but the CNN filter to train for the next recurrent training.
After each step of training, the value of λ1 is increased, and the λ2 is decreased to approximate the original content gradually.

3. Some Training Details

807 video sequences are compressed by using JEM 7.0 with RA configuration, with four QPs (22, 27, 32, 37).
The model compression strategy is adopted to reduce of complexity of AResNet filter.
First, the parameters of BN layers are merged to the convolution layer: