Reading: MGNLF — Multi-Gradient Convolutional Neural Network Based In-Loop Filter (VVC Filtering)

3.29% BD-Rate Reduction Compared to Conventional VVC While VRCNN and CACNN-S Cannot Obtain BD-rate reduction

4 min readAug 2, 2020

In this story, Multi-Gradient Convolutional Neural Network Based In-Loop Filter For VVC (MGNLF), by Peking University, is presented. I read this because I work on video coding research. In this paper:

A multi-gradient convolutional neural network based in-loop filter (MGNLF) for VVC is proposed.
Divergence and second derivatives of the frame are utilized.

This is a paper in 2020 ICME. (Sik-Ho Tsang @ Medium)

Outline

MGNLF: Network Architecture
MGNLF: Loss Function
Some Training Details
Experimental Results

1. MGNLF: Network Architecture

(a): The divergence reconstruction branch.
(b): The image reconstruction branch.
(c): The second derivative reconstruction branch.
First, the divergence DI and second derivative LI of the input frame are obtained by using the Sobel Operator and the Laplace Operator, which could be formulated as follows:

where I is the input frame, and * denotes the convolution operation.
Afterwards, DI and LI will be the inputs of two separated residual learning networks (a) and (c).
The structures of the three networks are the same.
Convolutional layers with 3×3 kernels and 64 feature maps are used. Each convolutional layer is followed by a LeakyReLU activation except the last layer.
Batch normalization is not used.
The outputs of (a) and (c) are denoted as D’I and L’I.
Then D’I and L’I will be transformed by a convolutional layer with 1×1 kernel and concatenated with the input image feature map to a feature maps with 64 channels.
By doing so, the feature map could preserve more detailed information present in the original image, which can promote the image reconstruction.
Finally, the reconstruction network will map the input to the residual between the frame I and the ground truth.

2. MGNLF: Loss Function

The loss function is:

where LR is the MSE loss of the image and LE is the enhancement loss for the divergence and the second derivative.

where λ is tuned based on experiments.

PSNR Against Training Steps for Different *λ Values*

It is found that λ=0.1 has the best result.

3. Some Training Details

DIV2K 800 images are used for training.
VTM 3.0 under AI configuration is used to compress the image to generate image pairs, with QPs (22, 27, 32, 37) used.
Filters DBF, SAO and ALF are disabled when compressing these sequences.
64×64 small patches, with 120K blocks for each QP, with removing the blocks whose PSNR are larger than 50 and randomly selecting 50,000 blocks for training, 1000 blocks for validation.
First train the model for QP = 37, then use it to train the networks with smaller QPs.
The model is to replace the DBF and SAO.

4. Experimental Results

4.1. Prior Arts

It is found that VRCNN and CACNN-S cannot obtain BD-rate reduction.

4.2. Ablation Study

The multi-gradient network is used to compared with single-gradient using Sobel Operator, single-gradient using Laplace Operator and no-gradient.
Multi-gradient shows the best restoration ability, which also demonstrates that the multi-gradient could capture more slight details and lead to performance improvement.

4.3. SOTA Comparisons

MGNLF obtains the largest BD-rate reduction compared to the other 3 approaches submitted to the standard.

4.4. RD Curves

The proposed approach performs better at low bit rate conditions.

4.5. Subjective Quality

**Subjective Quality (a) GT, (b) DRNLF, (c) Proposed MGNLF**

From the enlarged region in sequence BasketballDrill, it can be observed that the texture of floor and straight-lines are still severely blurred when compressed by DRNLF. In contrast, the texture of floor and straight-lines becomes more clear after being enhanced by MGNLF.

This is the 5th story in this month.

Reference

[2020 ICME] [MGNLF]
Multi-Gradient Convolutional Neural Network Based In-Loop Filter For VVC

Codec Filtering

JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [CNNF] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [ARTN] [Double-Input CNN] [CNNIF & CNNMC] [B-DRRN] [Residual-VRN] [Liu PCS’19] [DIA_Net] [RRCNN] [QE-CNN] [Jia TIP’19] [EDCNN] [VRCNN-BN] [MACNN]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19] [CNNLF]
VVC [AResNet] [Lu CVPRW’19] [Wang APSIPA ASC’19] [ADCNN] [DRCNN] [Zhang ICME’20] [MGNLF]