Reading: Jin ACCESS’18— Fast QTBT Partition Algorithm for Intra Frame Coding through Convolutional Neural Network (Fast VVC Prediction)
“RD Maintaining” Setting: 42.33% Complexity Reduction With Only 0.69% BD-Rate Increase. “Low Complexity” Setting: 62.08% Complexity Reduction With 2.04% BD-Rate Increase.
In this story, Fast QTBT Partition Algorithm for Intra Frame Coding through Convolutional Neural Network (Jin ACCESS’18), by Shanghai University, Jiaxing Vocational and Technical College, and University of Southern California, is briefly presented. It is the enhanced version of Jin ICIP’17 and Jin PCM’17. And I’ve just found this paper right now. Thus, I decided to write a story mainly describe the main difference from Jin PCM’17. (Please feel free to read Jin ICIP’17 and Jin PCM’17 first for the details of QTBT in VVC, and also the proposed CNN approach.)
- The main difference is the loss function that a misclassification penalty term is combined with L2 Hinge Loss to train the network.
This is a paper in 2018 IEEE ACCESS where ACCESS is an open-access journal with high impact factor of 3.745. (Sik-Ho Tsang @ Medium)
- Network Architecture
- Loss Function
- Experimental Results
1. Network Architecture
- The network is exactly the same as the Jin PCM’17 one where the network classifies a 32×32 block into one of the five classes as above. So, I don’t go into details about it here.
- According to the class, different depth ranges are assigned for CUs.
- As seen above, with smaller class depth, simpler texture the CU has, and vice versa.
2. Loss Function
If the true class_depth “10” is falsely predicted as “4”, degrades RD performance of coding, since the class_depth “10” represents current 32×32 CU should be partitioned into smaller CUs.
However, if the true class_depth “10” is falsely predicted as “9”, although this will inevitably cause the RD performance degradation, but the magnitude of the decline is much lighter.
- The loss function is a misclassification penalty term is combined with L2 Hinge Loss:
- Thus, P is a misclassication penalty term which is driven by the distance between the ground truth class label and predicated class label.
- And, Hn=max(1+yntn, 0) represents the Hinge Loss when a sample is classied into various classes.
- And it is found that the accuracy is improved when using the above loss function.
3. Experimental Results
3.1. BD-Rate (%)
- “RD Maintaining” Setting: 42.33% Complexity Reduction With Only 0.69% BD-Rate Increase.
- “Low Complexity” Setting: 62.08% Complexity Reduction With 2.04% BD-Rate Increase.
3.2. RD Curves
- The RD curves by the proposed approach are very close to the original one regardless the high/low bitrate condition.
3.3. Encoding Time Saving Against QPs
- The proposed algorithm is able to achieve consistent time saving under different QPs.
3.4. SOTA Comparison
- The proposed approach outperforms the above prior arts ,  and .
- And it also outperforms their previous approach Jin PCM’17  a bit.
This is the 11th story in this month.
[2018 ACCESS] [Jin ACCESS’18]
Fast QTBT Partition Algorithm for Intra Frame Coding through Convolutional Neural Network
Codec Fast Prediction
H.264 to HEVC [Wei VCIP’17] [H-LSTM]
HEVC [Yu ICIP’15 / Liu ISCAS’16 / Liu TIP’16] [Laude PCS’16] [Li ICME’17] [Katayama ICICT’18] [Chang DCC’18] [ETH-CNN & ETH-LSTM] [Zhang RCAR’19] [Kim TCVST’19] [LFHI & LFSD & LFMD Using AK-CNN] [Yang AICAS’20]
VVC [Jin VCIP’17] [Jin PCM’17] [Jin ACCESS’18] [Wang ICIP’18] [Galpin DCC’19] [Pooling-Variable CNN] [DeepQTMT]