Reading: FRUC+DVRF — Enhanced CTU-Level Inter Prediction with Deep Frame Rate Up-Conversion (HEVC Inter)

Using SepConv, 3% Coding Gains on Average for HEVC Test Sequences

4 min readMay 10, 2020

In this story, Enhanced CTU-Level Inter Prediction with Deep Frame Rate Up-Conversion for High Efficiency Video Coding (FRUC+DVRF), by Peking University, City University of Hong Kong, and University of Southern California, is described. I read this because I work on video coding research.

In this paper, a CNN-based Frame Rate Up-Conversion (FRUC) approach is used to interpolate an extra reference frame, which has the same time instant of the current frame. And Direct Virtual Reference Frame (DVRF) coding mode is introduced. By using DVRF, coding gain is achieved. This is a paper in 2018 ICIP. (Sik-Ho Tsang @ Medium)

Outline

Hierarchical B Structure in HEVC
Proposed FRUC and DVRF
Experimental Results

1. Hierarchical B Structure in HEVC

When Hierarchical B Structure is used in HEVC, i.e. Random Access Configuration. The coding order is based on the temporal level (TL). The frames with lower TL are coded first.
In this case, I0 and B8 are coded first (TL=0), then B4 (TL=1), then B2 and B6 (TL=2), and finally B1, B3, B5 and B7 (TL=3).
With this arrangement, frames with lower TL can be acted as reference frames for frames with higher TL so as to have efficient compression.

2. Proposed FRUC and DVRF

2.1. Frame Rate Up-Conversion (FRUC) Using SepConv

**Hierarchical B Structure with Virtual Reference Frame**

In this paper, authors proposed to have the high quality virtual reference frame generated using the deep learning based frame rate up-conversion (FRUC) algorithm.
In particular, SepConv, which is a CNN-based video frame interpolation, is used to generate this virtual reference frame. (If interested, please read my story about SepConv. That’s also why I read AdaConv and SepConv, lol.)
This virtual reference frame ^B1 has the same time instant with B1.

2.2. Direct Virtual Reference Frame (DVRF) coding mode in HEVC

After generating the virtual reference frame, a novel CTU level coding mode-direct virtual reference frame (DVRF) mode is introduced.
For each 64×64 CTUs in the current frame, a DVRF mode flag is signalled in the bitstreams to indicate whether the DVRF mode is chosen.
In particular, when DVRF flag is true, the co-located block in the virtual reference frame is treated as the reconstruction block.
Otherwise, traditional HEVC encoding process is conducted to encode the current CTU.

3. Experimental Results

**BD-Rate (%) against conventional HEVC with only proposed approach applied at TL=3**

YUV 400 is considered here.
When DVRF mode is applied only to TL=3 frames, FRUC provides on average 2.3% BD rate gain on HEVC sequences, and up to 5.4% gain is achieved on BQSquare.
Regarding BQTerrace, DVRF mode even has negative effect as SepConv may not be able to well handle the water wave case.

**BD-Rate (%) against conventional HEVC with proposed approach applied at TL=3 and TL=2**

When DVRF mode is applied only to TL=3 and TL = 2 frames, 3.2% coding gain is achieved, which demonstrates the robustness of the proposed method when the input frames of the FRUC algorithm have longer temporal distance.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 12th story in this month. Thanks for visiting my story..

Reference

[2018 ICIP] [FRUC+DVRF]
Enhanced CTU-Level Inter Prediction with Deep Frame Rate Up-Conversion for High Efficiency Video Coding

Codec Inter Prediction

HEVC [Zhang VCIP’17] [NNIP] [Ibrahim ISM’18] [FRUC+DVRF] [Zhao ISCAS’18 & TCSVT’19] [Ma ISCAS’19]