Reading: CNNLF — Residual Convolutional Neural Network Based In-Loop Filter (AVS3 Codec Filtering)

8.66% and 8.75% BD-Rate Reduction on Y Component Under RA and LD Configurations Separately

Sik-Ho Tsang
3 min readAug 1, 2020

In this story, Residual Convolutional Neural Network Based In-Loop Filter with Intra and Inter Frames Processed Respectively for AVS3 (CNNLF), by Tencent, is presented. It is called CNNLF because the network is named at the subjective comparison. I read this because I work on video coding research. In this paper:

  • A deep residual convolutional neural network based in-loop filter is proposed to suppress compression artifacts for the third generation of Audio Video Standard (AVS3).

This is a paper in 2020 ICMEW. (Sik-Ho Tsang @ Medium)

Outline

  1. CNNLF: Network Architecture
  2. AVS3 Implementation
  3. Experimental Results

1. CNNLF: Network Architecture

CNNLF: Network Architecture
  • It is found that the residual block and residual in residual structure can promote the ability of plain network obviously with little complexity added.
  • Skip connection and residual learning are used to accelerate the information transferring and they also make the network focus more on compression distortion.
  • Apart from the head and the tail convolutional layers, the network contains M=10 residual blocks, where each layer has N=64 input channels and N output channels. These residual blocks remove batch normalization layer.
  • YUV420 block is converted into YUV44 block before being fed into the network.
  • A QP map is also fed into the network with reconstructed frame.
  • Thus, the proposed model can be used to suppress compression related artifacts with different QPs, with no need to train multiple models for multiple QP bands.
  • Weighted average of L1 loss is used:
  • where a1 is greater than a2, a3, for chroma components are more smooth and easier to converge than luma component.

2. AVS3 Implementation

  • Two models are trained to process intra and inter frames respectively.
  • (There is analysis/argument for the distortion difference between intra and inter frames. Please feel free to read the paper if interested.)
  • HPM5.0 is used.
  • DIV2K is used for training.
  • The proposed in-loop filter is to replace the traditional DF and SAO filters.
  • Frame level and CTU level RDO are performed to choose whether the proposed filter is used or not.

3. Experimental Results

3.1. BD-Rate

BD-Rate (%)
  • Whether under RA or LD configuration, the proposed method achieves higher improvement than the others, with moderate model size and computation complexity.

3.2. Subjective Quality

Subjective Quality
  • The proposed method can promote subjective quality.

3.3. Generalization Ability

BD-Rate (%) With Different QP Bands
  • Thanks to the QP map, the proposed model can be used to suppressing compression artifacts related to different QPs, with no need to training multiple models for multiple QP bands.

This is the 3rd story in this month.

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet