Reading: Lin PCS’19 — Residual in Residual Based Convolutional Neural Network In-loop Filter for AVS3 (Codec Filtering)

Using Residual Block Originated in ResNet, with Long Skip Connection, 7.5%, 16.9% and 18.6% BD-Rate Reduction for Y, U, V

this story, Residual in Residual Based Convolutional Neural Network In-loop Filter for AVS3 (Lin PCS’19), by Peking University, and Hikvision Research Institute, is briefly described. I read this because I work on video coding research. This is a paper in 2019 PCS. (Sik-Ho Tsang @ Medium)

Outline

  1. Network Architecture
  2. AVS3 Implementation
  3. Experimental Results

1. Network Architecture

1.1. Network Architecture for Luma

Left: Network Architecture for Luma, Right: Residual Block
  • N=12 Residual blocks, originated in ResNet, are used in the network.
  • Each residual block has two convolutional layers separated by ReLU.
  • Long skip connection is used, as shown above.
  • The network is a fully convolutional network (FCN).

1.2. Network Architecture for Chroma

Network Architecture for Chroma
  • N=6 Residual blocks are used in the network.
  • Long skip connection is also used, as shown above.
  • As the color format is YUV420, chroma is first upsampled by nearest neighbor before convolution.
  • Also, the luma is also inputted, as the textural and structural guidance for chroma filtering, and concatenated with the feature maps .

2. AVS3 Implementation

  • There are frame-level flag and CTU-level flag.
  • The frame flags are applied for both luma and chroma components. And the coding tree unit (CTU) flags are utilized for luma.
  • The frame-level on/off flags are selected based on the rate distortion optimization (RDO) while the CTU-level on/off flag are selected based on distortion only.

3. Experimental Results

3.1. Training

  • DIV2K are used for training and validation.
  • AVS3 reference software (HPM3.1) are used.
  • MSE loss is used for training.
  • A total of 12 models are trained to cover a large range of QPs.

3.2. BD-Rate

BD-rate (%) under AI configuration
  • The proposed approach achieves 7.51%, 16.88% and 18.59% BD-rate saving on average for Y, U, V respectively compared with anchor.
BD-rate (%) under RA configuration
  • The proposed approach achieves 3.28%, 14.37% and 13.59% BD-rate saving on average for Y, U, V respectively compared with anchor.

3.3. Subjective Quality

Subjective Quality
  • The face is much clear by the proposed method. The artifacts including blocking and ringing disappear in reconstructed image of our proposed method, and it is more closer to original image.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 22nd story in this month. Thanks for visiting my story..

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG