Reading: MGNLF — Multi-Gradient Convolutional Neural Network Based In-Loop Filter (VVC Filtering)
3.29% BD-Rate Reduction Compared to Conventional VVC While VRCNN and CACNN-S Cannot Obtain BD-rate reduction
In this story, Multi-Gradient Convolutional Neural Network Based In-Loop Filter For VVC (MGNLF), by Peking University, is presented. I read this because I work on video coding research. In this paper:
- A multi-gradient convolutional neural network based in-loop filter (MGNLF) for VVC is proposed.
- Divergence and second derivatives of the frame are utilized.
This is a paper in 2020 ICME. (Sik-Ho Tsang @ Medium)
Outline
- MGNLF: Network Architecture
- MGNLF: Loss Function
- Some Training Details
- Experimental Results
1. MGNLF: Network Architecture
- (a): The divergence reconstruction branch.
- (b): The image reconstruction branch.
- (c): The second derivative reconstruction branch.
- First, the divergence DI and second derivative LI of the input frame are obtained by using the Sobel Operator and the Laplace Operator, which could be formulated as follows:
- where I is the input frame, and * denotes the convolution operation.
- Afterwards, DI and LI will be the inputs of two separated residual learning networks (a) and (c).
- The structures of the three networks are the same.
- Convolutional layers with 3×3 kernels and 64 feature maps are used. Each convolutional layer is followed by a LeakyReLU activation except the last layer.
- Batch normalization is not used.
- The outputs of (a) and (c) are denoted as D’I and L’I.
- Then D’I and L’I will be transformed by a convolutional layer with 1×1 kernel and concatenated with the input image feature map to a feature maps with 64 channels.
- By doing so, the feature map could preserve more detailed information present in the original image, which can promote the image reconstruction.
- Finally, the reconstruction network will map the input to the residual between the frame I and the ground truth.
2. MGNLF: Loss Function
- The loss function is:
- where LR is the MSE loss of the image and LE is the enhancement loss for the divergence and the second derivative.
- where λ is tuned based on experiments.
- It is found that λ=0.1 has the best result.
3. Some Training Details
- DIV2K 800 images are used for training.
- VTM 3.0 under AI configuration is used to compress the image to generate image pairs, with QPs (22, 27, 32, 37) used.
- Filters DBF, SAO and ALF are disabled when compressing these sequences.
- 64×64 small patches, with 120K blocks for each QP, with removing the blocks whose PSNR are larger than 50 and randomly selecting 50,000 blocks for training, 1000 blocks for validation.
- First train the model for QP = 37, then use it to train the networks with smaller QPs.
- The model is to replace the DBF and SAO.
4. Experimental Results
4.1. Prior Arts
4.2. Ablation Study
- The multi-gradient network is used to compared with single-gradient using Sobel Operator, single-gradient using Laplace Operator and no-gradient.
- Multi-gradient shows the best restoration ability, which also demonstrates that the multi-gradient could capture more slight details and lead to performance improvement.
4.3. SOTA Comparisons
- MGNLF obtains the largest BD-rate reduction compared to the other 3 approaches submitted to the standard.
4.4. RD Curves
- The proposed approach performs better at low bit rate conditions.
4.5. Subjective Quality
- From the enlarged region in sequence BasketballDrill, it can be observed that the texture of floor and straight-lines are still severely blurred when compressed by DRNLF. In contrast, the texture of floor and straight-lines becomes more clear after being enhanced by MGNLF.
This is the 5th story in this month.
Reference
[2020 ICME] [MGNLF]
Multi-Gradient Convolutional Neural Network Based In-Loop Filter For VVC
Codec Filtering
JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [CNNF] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [ARTN] [Double-Input CNN] [CNNIF & CNNMC] [B-DRRN] [Residual-VRN] [Liu PCS’19] [DIA_Net] [RRCNN] [QE-CNN] [Jia TIP’19] [EDCNN] [VRCNN-BN] [MACNN]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19] [CNNLF]
VVC [AResNet] [Lu CVPRW’19] [Wang APSIPA ASC’19] [ADCNN] [DRCNN] [Zhang ICME’20] [MGNLF]