Reading: Deep Learning-Based Technology in Responses to the Joint Call for Proposals on Video Compression With Capability Beyond HEVC (Liu TCSVT’20)

This is a Summary Paper for Deep Tools in Video Coding !!!!!

Sik-Ho Tsang
4 min readJun 27, 2020
Video Coding Development Towards VVC (Slides from Rwthaachen University)

In this story, Deep Learning-Based Technology in Responses to the Joint Call for Proposals on Video Compression With Capability Beyond HEVC (Liu TCSVT’20), by University of Science and Technology of China, Wuhan University, and Tencent Media Lab, is presented.

  • Prior arts can be divided into two categories.
  • Deep Schemes: New coding schemes that are built solely upon deep networks.
  • Deep Tools: Deep network-based coding tools that are embedded into traditional coding schemes.
  • In this paper, the ongoing efforts in the Joint Video Experts Team (JVET) about the proposed deep tools are summarized. (Deep schemes will not be covered since the paper only summarized the deep tools.)

This is a paper in 2020 TCVST where TCSVT has a high impact factor of 4.046. (Sik-Ho Tsang @ Medium)

Outline

HEVC Encoder Block Diagram
  1. Prediction Tools
  2. Transform Tools
  3. Entropy Coding Tools
  4. Post-Processing and In-Loop Filtering Tools
  5. Down- and Up-Sampling-Based Coding Tools
  6. Encoding Optimization Tools

1. Prediction Tools

  • For intra-picture prediction, Both fully-connected networks IPFCN [22] and CNN based methods IPCNN [32] are proposed.
  • For inter-picture prediction, several works focus on the task of fractional-pixel interpolation FRCNN [23], CNNInvIF / InvIF [33], GVCNN [34], while other works deal with bi-directional motion compensation FRUC+DVRF+VECNN [35], motion compensation refinement CNNMCR [36], combined intra/inter prediction NNIP [37], or directly extrapolating a frame for reference VC-LAPGAN [38].
  • For cross-channel prediction, a multiple hypothesis method is proposed to predict chroma components from luma components Baig JVICU’17 [39], and a hybrid network-based method is presented to combine hints of collocated luma and neighboring chroma HybridNN, Li ICIP’18 [24].

2. Transform Tools

  • In Liu MMM’18 [25], a CNN-based DCT-like transform is studied for image compression, where the network is virtually an auto-encoder.
  • In DeepCoder [40], an autoencoder is proposed for compressing the motion predicted residue in video coding, and the quantized features are further compressed by Huffman coding. (I have not written about it since it seems that it involves optical flow, in which I need to write another story line for it…)

3. Entropy Coding Tools

4. Post-Processing and In-Loop Filtering Tools

  • These tools are inspired by the confirmed success of deep learning-based compression artifact reduction ARCNN [43] and image denoising DnCNN [5].
  • Post-processing tools are applied solely at the decoder side to improve reconstruction quality, as studied in VRCNN [44], DCAD [45], QE-CNN [46], MFQE [47]. (For MFQE, I have not written about it since it seems that it involves optical flow, in which I need to write another story line for it…)
  • In-loop filtering tools are applied inside the coding loop, i.e. filtered frames are used as references for later frames, as studied in IFCNN [27], VRCNN-ext [48], Jia TIP’19 [49], RHCNN [50].

5. Down- and Up-Sampling-Based Coding Tools

  • These tools are inspired by the success of deep learning-based super-resolution SRCNN [4].
  • Traditionally, down-sampling prior to encoding and upsampling after decoding is known to be better than direct coding at very low bit-rates.
  • With the help of trained deep networks for down-sampling and/or up-sampling, the performance of down- and up-sampling-based coding is further enhanced JVET-J0031 [51]. (I have not read this…)
  • In addition, block-adaptive-resolution coding (BARC) is proposed where several blocks are down-sampled to encode but others are directly encoded, so as to suit different local characteristics.
  • BARC together with deep networks are studied for intra frames in Li TCSVT’18 [28], CNN-CR [52], for inter frames in CNN-SR & CNN-UniSR & CNN-BiSR [53], and for motion predicted residues in RSR [54], respectively.

6. Encoding Optimization Tools

  • For fast mode decision, deep learning-based methods have shown remarkable success Liu TIP’16 [29], ETH-CNN & ETH-LSTM [55].
  • Rate control is highly dependent on the rate modeling. In the case of the rate-lambda (R-λ) model, a CNN-based method is proposed to estimate the model parameters for different image blocks Li VCIP’17 [56].
  • In MS-ROI [57], a CNN-based method is adopted to decide the salient regions of an image, with the following rate allocation to improve the quality of salient regions.

There are also deep tools in Call for Proposals (CfP) for the development of VVC, in which they are mainly JVET documents, described in this paper. Some of them have been turned into papers such as CNNF [60], DRN [70–71]. But they are not one-to-one, it is quite difficult to summarize here.

After reviewing the above papers, I will slow down a bit for the video coding paper reviews since most of them related to deep learning should be reviewed. (There are still large amount of non-deep-learning based video coding papers …)

(Besides the papers above, I also reviewed other deep learning based video coding papers which have not described in this paper. If interested, please find in my summary article.)

This is the 36th story in this month!

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.