Reading: Jin ACCESS’18— Fast QTBT Partition Algorithm for Intra Frame Coding through Convolutional Neural Network (Fast VVC Prediction)

“RD Maintaining” Setting: 42.33% Complexity Reduction With Only 0.69% BD-Rate Increase. “Low Complexity” Setting: 62.08% Complexity Reduction With 2.04% BD-Rate Increase.

In this story, Fast QTBT Partition Algorithm for Intra Frame Coding through Convolutional Neural Network (Jin ACCESS’18), by Shanghai University, Jiaxing Vocational and Technical College, and University of Southern California, is briefly presented. It is the enhanced version of Jin ICIP’17 and Jin PCM’17. And I’ve just found this paper right now. Thus, I decided to write a story mainly describe the main difference from Jin PCM’17. (Please feel free to read Jin ICIP’17 and Jin PCM’17 first for the details of QTBT in VVC, and also the proposed CNN approach.)

  • The main difference is the loss function that a misclassification penalty term is combined with L2 Hinge Loss to train the network.

This is a paper in 2018 IEEE ACCESS where ACCESS is an open-access journal with high impact factor of 3.745. (Sik-Ho Tsang @ Medium)

Outline

  1. Network Architecture
  2. Loss Function
  3. Experimental Results

1. Network Architecture

Network Architecture
  • The network is exactly the same as the Jin PCM’17 one where the network classifies a 32×32 block into one of the five classes as above. So, I don’t go into details about it here.
  • According to the class, different depth ranges are assigned for CUs.
Examples of CUs with Class Depth from 4 to 10
  • As seen above, with smaller class depth, simpler texture the CU has, and vice versa.

2. Loss Function

If the true class_depth “10” is falsely predicted as “4”, degrades RD performance of coding, since the class_depth “10” represents current 32×32 CU should be partitioned into smaller CUs.

However, if the true class_depth “10” is falsely predicted as “9”, although this will inevitably cause the RD performance degradation, but the magnitude of the decline is much lighter.

  • The loss function is a misclassification penalty term is combined with L2 Hinge Loss:
  • Thus, P is a misclassication penalty term which is driven by the distance between the ground truth class label and predicated class label.
  • And, Hn=max(1+yntn, 0) represents the Hinge Loss when a sample is classied into various classes.
Prediction Accuracy
  • And it is found that the accuracy is improved when using the above loss function.

3. Experimental Results

3.1. BD-Rate (%)

BD-Rate (%) on VVC Test Sequences
  • “RD Maintaining” Setting: 42.33% Complexity Reduction With Only 0.69% BD-Rate Increase.
  • “Low Complexity” Setting: 62.08% Complexity Reduction With 2.04% BD-Rate Increase.

3.2. RD Curves

RD Curves
  • The RD curves by the proposed approach are very close to the original one regardless the high/low bitrate condition.

3.3. Encoding Time Saving Against QPs

Encoding Time Saving Against QPs
  • The proposed algorithm is able to achieve consistent time saving under different QPs.

3.4. SOTA Comparison

BD-Rate (%) on VVC Test Sequences
  • The proposed approach outperforms the above prior arts [21], [15] and [14].
  • And it also outperforms their previous approach Jin PCM’17 [8] a bit.

This is the 11th story in this month.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store