Reading: FRUC+DVRF+VECNN — Virtual Reference Frame Enhancement CNN (HEVC & FVC Inter)

VECNN Enhance Virtual Frame Quality, 6% Average BD-Rate Gain Over HM-16.6, 0.8% BD-Rate Gain Over JEM-7.1

4 min readMay 16, 2020

In this story, Enhanced Motion-Compensated Video Coding With Deep Virtual Reference Frame Generation (FRUC+DVRF+VECNN), by Peking University, City University of Hong Kong, and University of Chinese Academy of Sciences, is briefly described since part of it has been published in 2018 ICIP and I have already mentioned in FRUC+DVRF. (Please feel free to read the FRUC+DVRF story first before reading this story. I’ve just found this paper recently, otherwise I will describe both of them together in one story.)

And, I will only mention about the remaining part, that is VECNN. This is a paper in 2019 TIP with high impact factor of 6.79. (Sik-Ho Tsang @ Medium)

Outline

Overall Framework
VECNN Network Architecture
Experimental Results

1. Overall Scheme

First, the virtual reference frame (VRF) is generated using deep learning based video frame interpolation approach. (I believe it is SepConv, it is not so clear in the paper.)
This VRF should have the same time instant as the current frame. e.g.: we need to encode B1 as above, then VRF ^B1 is generated.
Then, VRF is enhanced by VECNN which will be described later.
After that, the VRF is put into at the end of the reference picture lists: list0 and list1, as shown above.
These reference frames in the lists are used for motion estimation and compensation.
With more correlated frames acted as reference frames, bitrate is more likely to be reduced, or video quality is more likely to be increased.
Also, DVRF mode is introduced such that only one flag is used to indicate that the CTU is copied from the colocated CTU using zero motion vector (MV).

2. VECNN Network Architecture

The core module in VECNN is the residual block, originated in ResNet.
There are N residual blocks in the network.

It is found that N = 8 achieves the lowest MSE. Thus N is set to 8 in VECNN.
An individual model is trained for each QP.

3. Experimental Results

With only VRF, 4.5% average BD-rate reduction is obtained compared to HM-16.6.
With only VRF, 0.7% average BD-rate reduction is obtained compared to JEM-7.1.

With VECNN to enhance VRF, 4.5% average BD-rate reduction is obtained compared to HM-16.6.
(Authors did not provide results in JEM for VECNN. Also, it is not so clear that whether DVRF mode is used for the tables above. But I guess DVRF mode is applied.)

VRF_INIT: Only VRF without VECNN.
VRF_VECNN: VRF with VECNN.
About 30% — 45% additional complexity is observed at the encoder side.
40× decoder complexity is observed on VRF_INIT, and the time cost increases to 70× when VECNN is further adopted in VRF_VECNN and DVRF.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 21st story in this month. Thanks for visiting my story..

Reference

[2019 TIP] [FRUC+DVRF+VECNN]
Enhanced Motion-Compensated Video Coding With Deep Virtual Reference Frame Generation

Codec Inter Prediction

HEVC [Zhang VCIP’17] [NNIP] [Ibrahim ISM’18] [VI-CNN] [FRUC+DVRF][FRUC+DVRF+VECNN] [Zhao ISCAS’18 & TCSVT’19] [Ma ISCAS’19] [ES]
FVC [FRUC+DVRF+VECNN]