Reading: FRUC+DVRF — Enhanced CTU-Level Inter Prediction with Deep Frame Rate Up-Conversion (HEVC Inter)

Using SepConv, 3% Coding Gains on Average for HEVC Test Sequences

In this story, Enhanced CTU-Level Inter Prediction with Deep Frame Rate Up-Conversion for High Efficiency Video Coding (FRUC+DVRF), by Peking University, City University of Hong Kong, and University of Southern California, is described. I read this because I work on video coding research.

In this paper, a CNN-based Frame Rate Up-Conversion (FRUC) approach is used to interpolate an extra reference frame, which has the same time instant of the current frame. And Direct Virtual Reference Frame (DVRF) coding mode is introduced. By using DVRF, coding gain is achieved. This is a paper in 2018 ICIP. (Sik-Ho Tsang @ Medium)

Outline

  1. Hierarchical B Structure in HEVC
  2. Proposed FRUC and DVRF
  3. Experimental Results

1. Hierarchical B Structure in HEVC

Hierarchical B Structure in HEVC
  • When Hierarchical B Structure is used in HEVC, i.e. Random Access Configuration. The coding order is based on the temporal level (TL). The frames with lower TL are coded first.
  • In this case, I0 and B8 are coded first (TL=0), then B4 (TL=1), then B2 and B6 (TL=2), and finally B1, B3, B5 and B7 (TL=3).
  • With this arrangement, frames with lower TL can be acted as reference frames for frames with higher TL so as to have efficient compression.

2. Proposed FRUC and DVRF

2.1. Frame Rate Up-Conversion (FRUC) Using SepConv

Hierarchical B Structure with Virtual Reference Frame
  • In this paper, authors proposed to have the high quality virtual reference frame generated using the deep learning based frame rate up-conversion (FRUC) algorithm.
  • In particular, SepConv, which is a CNN-based video frame interpolation, is used to generate this virtual reference frame. (If interested, please read my story about SepConv. That’s also why I read AdaConv and SepConv, lol.)
  • This virtual reference frame ^B1 has the same time instant with B1.

2.2. Direct Virtual Reference Frame (DVRF) coding mode in HEVC

The Proposed DVRF Coding Mode
  • After generating the virtual reference frame, a novel CTU level coding mode-direct virtual reference frame (DVRF) mode is introduced.
  • For each 64×64 CTUs in the current frame, a DVRF mode flag is signalled in the bitstreams to indicate whether the DVRF mode is chosen.
  • In particular, when DVRF flag is true, the co-located block in the virtual reference frame is treated as the reconstruction block.
  • Otherwise, traditional HEVC encoding process is conducted to encode the current CTU.

3. Experimental Results

BD-Rate (%) against conventional HEVC with only proposed approach applied at TL=3
  • YUV 400 is considered here.
  • When DVRF mode is applied only to TL=3 frames, FRUC provides on average 2.3% BD rate gain on HEVC sequences, and up to 5.4% gain is achieved on BQSquare.
  • Regarding BQTerrace, DVRF mode even has negative effect as SepConv may not be able to well handle the water wave case.
BD-Rate (%) against conventional HEVC with proposed approach applied at TL=3 and TL=2
  • When DVRF mode is applied only to TL=3 and TL = 2 frames, 3.2% coding gain is achieved, which demonstrates the robustness of the proposed method when the input frames of the FRUC algorithm have longer temporal distance.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 12th story in this month. Thanks for visiting my story..

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Why write a Solution Description for a machine-learning problem

ACoL: Adversarial Complementary Learning (Weakly Supervised Object Localization)

A review of CNN architectures with Dog Breeds Classification

Building an Intelligent News Recommendation System Inside Sohu News App

Introduction to Knowledge Graphs and their Applications

Why Supervised Learning still often beats Unsupervised Learning?

Review — SASA: Stand-Alone Self-Attention in Vision Models

Revew: WDRN / WavResNet — Wavelet-based Deep Residual Learning Network (Image Denoising & Super…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sik-Ho Tsang

Sik-Ho Tsang

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

More from Medium

Review — fastText: Enriching Word Vectors with Subword Information

Paper Review: Parameter Prediction for Unseen Deep Architectures

InfoGAN: learning to generate controllable images from scratch (Pytorch)

Face Tracking With the Ryze Tello, Part 1: Face Detection