Reading: AResNet — Attention Residual Neural Network (Codec Filtering)

2.77%, 7.01%, 8.64% BD-Rate Reduction for Luma & Both Chroma With RA Configuration for JEM

Sik-Ho Tsang
4 min readJun 16, 2020
The Proposed AResNet used as CNNF (CNN Filter)

In this story, Attention Residual Neural Network (AResNet), by Hikvision Research Institute, is briefly presented. I read this because I work on video coding research. In this paper:

  • In-loop filter for inter frames is proposed to completely replace all the conventional filters in the codec.
  • Moreover, the greedy heuristic approach is used during training.

This is a paper in 2019 ICME. (Sik-Ho Tsang @ Medium)

Outline

  1. Attention Residual Neural Network (AResNet)
  2. Greedy Heuristic Approach
  3. Some Training Details
  4. Experimental Results

1. Attention Residual Neural Network (AResNet)

Attention Residual Neural Network (AResNet)
  • Based on CNNF, an attention net is added, to control filter strength based on the input content and QP map.
  • The attention residual network with 7 blocks is taken.
  • Each block module contains a convolutional layer followed by a BN and a ReLU as the non-linear activation.
  • The kernel number of each block is set to 64, the kernel size of each block is set to 3×3, except the first one, which is set to 5×5 to expand the field of the perception.
  • For the last three convolutional layers, the kernel numbers are set to 256, 256 and 1 separately, which means KL equals to 256, KL+1 equals to 1.
  • For I frames, CNNF is used.

2. Greedy Heuristic Approach

  • To train the network, instead of standard L2 norm, authors use:
  • where the x represents the original content without the compression, the ˆx represents the reconstructed content filtered by the loop filers.
  • ~x represents the reconstructions filtered by the loop filter generated by the last greedy training step, which is mentioned later.
Greedy Heuristic Approach
  • As shown the algorithm above.
  • For the first training (Steps 06 to 11): All the conventional filters of JEM 7.0 are enabled to get the reconstructed data before and after filters as training data pair. Thus, a faster convergence is achieved.
  • For the next training (Steps 05 to 14): Then, disable all the filters off but the CNN filter to train for the next recurrent training.
  • After each step of training, the value of λ1 is increased, and the λ2 is decreased to approximate the original content gradually.

3. Some Training Details

  • 807 video sequences are compressed by using JEM 7.0 with RA configuration, with four QPs (22, 27, 32, 37).
  • The model compression strategy is adopted to reduce of complexity of AResNet filter.
  • First, the parameters of BN layers are merged to the convolution layer:

4. Experimental Results

4.1. BD-Rate

Using L2 Norm Compared to JEM
  • With only L2 norm as loss function, 7.77% increase in BD-rate is obtained.
Using Proposed Loss Function Compared to L2 Norm
  • AResNet achieves 7.91% BD-rate reduction compared to L2 Norm one.
Using Proposed Loss Function Compared to JEM
  • 2.77% BD-rate reduction is obtained by AResNet compared to JEM.
  • With ALF on, 4.95% BD-rate reduction is even obtained by AResNet compared to JEM.

4.2. PSNR After Each Training Step

PSNR After Each Training Step for BQSquare at QP37
  • With greedy approach of the recurrent training, the PSNR of the reconstructions get higher and higher.

4.3. Visual Quality

  • AResNet obtains a better image with some artifacts removed.

This is the 24th story in this month!

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet