Reading: FRUC+DVRF+VECNN — Virtual Reference Frame Enhancement CNN (HEVC & FVC Inter)

VECNN Enhance Virtual Frame Quality, 6% Average BD-Rate Gain Over HM-16.6, 0.8% BD-Rate Gain Over JEM-7.1


  1. Overall Framework
  2. VECNN Network Architecture
  3. Experimental Results

1. Overall Scheme

Overall Scheme
  • First, the virtual reference frame (VRF) is generated using deep learning based video frame interpolation approach. (I believe it is SepConv, it is not so clear in the paper.)
  • This VRF should have the same time instant as the current frame. e.g.: we need to encode B1 as above, then VRF ^B1 is generated.
  • Then, VRF is enhanced by VECNN which will be described later.
  • After that, the VRF is put into at the end of the reference picture lists: list0 and list1, as shown above.
  • These reference frames in the lists are used for motion estimation and compensation.
  • With more correlated frames acted as reference frames, bitrate is more likely to be reduced, or video quality is more likely to be increased.
  • Also, DVRF mode is introduced such that only one flag is used to indicate that the CTU is copied from the colocated CTU using zero motion vector (MV).

2. VECNN Network Architecture

VECNN Network Architecture
  • The core module in VECNN is the residual block, originated in ResNet.
  • There are N residual blocks in the network.
MSE against N
  • It is found that N = 8 achieves the lowest MSE. Thus N is set to 8 in VECNN.
  • An individual model is trained for each QP.

3. Experimental Results

Only VRF (No VECNN) in HM-16.6
Only VRF (No VECNN) in JEM-7.1
  • With only VRF, 4.5% average BD-rate reduction is obtained compared to HM-16.6.
  • With only VRF, 0.7% average BD-rate reduction is obtained compared to JEM-7.1.
VRF+VECNN in HM-16.6
  • With VECNN to enhance VRF, 4.5% average BD-rate reduction is obtained compared to HM-16.6.
  • (Authors did not provide results in JEM for VECNN. Also, it is not so clear that whether DVRF mode is used for the tables above. But I guess DVRF mode is applied.)
Computational Complexity
  • VRF_INIT: Only VRF without VECNN.
  • About 30% — 45% additional complexity is observed at the encoder side.
  • 40× decoder complexity is observed on VRF_INIT, and the time cost increases to 70× when VECNN is further adopted in VRF_VECNN and DVRF.



PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store