Reading: Residual-VRN — Residual-based Video Restoration Network (Codec Filtering)

Using DCReLU in ResNet Blocks and Mixed Loss Function, Outperforms VRCNN, DCAD, Over 10% BD-Rate Reduction

5 min readMay 18, 2020

In this story, Residual-based Video Restoration Network (Residual-VRN), by Peking University, is described. Based on the proposed network, residual-VRN, the decoded image quality has been improved significantly. I read this because I work on video coding research.

This is firstly published in 2018 BigMM conference, then published in 2019 IEEE Multimedia Magazine which is a magazine that has high impact factor of 3.556. (not Transactions on Multimedia, TMM) The whole magazine article is freely accessed. The hyperlink is provided at the end of the story.

Though both paper are read, I would mainly describe the one in 2019 IEEE Multimedia Magazine here. (Sik-Ho Tsang @ Medium)

Outline

Double-Channels ReLU Activation Function (DCReLU)
Residual Blocks
Mixed Loss Function
Residual-VRN Network Architecture
Experimental Results

1. Double-Channels ReLU Activation Function (DCReLU)

It is a multi-threshold activation function where β is a trainable scale parameter, initialized with the value of 0.5. And η1 and η2 are bias thresholds which are also trainable.
One feature map will produce two feature maps.
The former mainly focuses on the positive phase, while the latter focuses on the negative one.
Thus, when DCReLU is used, a 1×1 convolutional layer is followed in the residual blocks for reducing the dimensions/channels.

**PSNR Obtained Using Different Activation Functions**

It is found that DCReLU obtains slightly higher average PSNR.

2. Residual Blocks

Similar to ResNet, where RB(x) is the output of the residual block, DCReLU denotes the DCReLUs activation function, F1 and F2 denote the conv 3×3×64 and conv 1×1×64, and U(x) is the residual to be learned.

3. Mixed Loss Function

In the first half of the training process, L1 and LMSSSIM are combined.
LMSSSIM focuses on the contrast in high-frequency regions, but it is not particularly sensitive to uniform biases, while L1 weighs the luminance error equally regardless of the local structure which can make up the lack of LMSSSIM.
In the second half of the training process, the L2 norm is used as the loss function to maximize the coding efficiency since PSNR is measured during the experiments.

**BD-Rate (%) for Different Loss Functions**

It is found that ~Lmix has the largest BD-rate reduction.

4. Residual-VRN Network Architecture

4.1. Network Architecture

The whole process can be expressed as;

where F1 represents a 5×5×64 convolution layer, F2 represents a 3×3×64 convolution layer, F3 represents a 3×3×1 convolution layer.
And Norm is Batch Normalization.
And the conventional networks by adjusting their structure to the same with residual-VRNs, are named the improved-VRNs.

4.2. Deep Residual-VRN

The depth of residual-VRN is increased by stacking the residual blocks to construct 16-residual-block, 32-residual-block, 64-residual-block networks. These three networks are called deep residual-VRN-16 (DR-VRN-16), DR-VRN-32, and DR-VRN-64.
On the other hand, we do the same for the conventional VRNs get deep improved-VRN-16 (DI-VRN-16), DI-VRN-32, and DI-VRN-64.

5. Experimental Results

5.1. Dataset

**Residual Errors of Intra Frames (Left) and Inter Frames (Right)**

**Variances of Residual Errors for Intra and Inter Frames**

The distribution of residual errors of intra frames and inter frames are quite different. Thus, the networks are separately trained.
For intra mode, MS-COCO is used as the dataset because it can supply a large number of images. 10 000 images are randomly selected and converted into YUV format. Each YUV is compressed by HEVC intra coding with deblocking and SAO turned off at four different QPs: 22, 27, 32, and 37.
For each QP, a separate network is trained out. Only the luminance is used.
For inter mode, 15 UHD sequences are chosen as the dataset. Each sequence is compressed by HEVC under three coding configurations: lowdelay P (LP), lowdelay B (LB), and random access (RA).
For each sequence, 40 frames are extracted, one extracted for every five frames to avoid overfitting.

5.2. Comparing Residual-VRN and Improved-VRN

With the increase of the number of layers, the performance of Improved-VRN is getting worse. This is because when there is no residual path, once the model is too deep, gradient vanishing problem occurs.

5.3. Comparing Residual-VRN With DCAD

**BD-Rate (%) Against the Conventional HEVC**

Residual-VRN achieve better performance than DCAD.

5.4. Complexity Analysis

**Run Time Complexity (I guess it is in the unit of seconds)**

With Residual-VRN, the time is increased largely due to the use of convolutions.

5.5. Comparisons With State-of-the-Art Approaches

The basic Residual-VRN already outperforms VRCNN and DCAD.
With deeper models, the BD-rate reduction can be further increased to 8.2%, 8.6% and 10.9%.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 26th story in this month. 4 stories to go. Thanks for visiting my story..

References

[2018 BigMM] [Residual-VRN]
Residual-Based Video Restoration for HEVC Intra Coding

[2019 IEEE Multimedia] [Residual-VRN]
Residual-Based Post-Processing for HEVC

Codec Filtering

JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [Double-Input CNN] [B-DRRN] [Residual-VRN] [Liu PCS’19] [QE-CNN] [EDCNN]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19]
VVC [Lu CVPRW’19] [Wang APSIPA ASC’19]