Reading: LFHI & LFSD & LFMD Using AK-CNN — Asymmetric-Kernel CNN (Fast HEVC Prediction)

Outperforms Liu TIP’16 and ETH-CNN & ETH-LSTM, Smaller Model Size Than Li ICME’17, 75.2% Time Reduction Under All-Intra Configuration

In this paper, Asymmetric-Kernel CNN (AK-CNN), by University of Science and Technology of China, is presented. In this paper:

  • Asymmetric horizontal and vertical convolution kernels are designed to precisely extract the texture features of each block with much lower complexity.
  • The confidence threshold decision scheme is designed in the PU partition part to achieve the best trade-off between the coding performance and complexity reduction.
  • This AK-CNN is used for fast size decision as well as fast mode decision, namely Learned Fast Size Decision (LFSD) and Learned Fast Mode Decision (LFMD), respectively. With both used, it becomes a Learned Fast HEVC Intra (LFHI) Coding Scheme.

LFSD is firstly published in 2019 ISCAS and then combining with LFMD to form LFHI, it is published in 2020 TIP where TIP has a high impact factor of 6.79. (Sik-Ho Tsang @ Medium)

Outline

  1. AK-CNN: Network Architecture for Learned Fast Size Decision (LFSD)
  2. Confidence Threshold Decision Scheme in PU Size Decision
  3. Learned Fast Mode Decision (LFMD)
  4. Some Training Details
  5. Experimental Results for LFHI & LFSD & LFMD

1. AK-CNN: Network Architecture for Learned Fast Size Decision (LFSD)

Image for post
Image for post
AK-CNN: Network Architecture for 64×64
  • In HEVC, the coding block can be have the size of 64×64, 32×32, 16×16, 8×8 and 4×4. There are 4 AK-CNN classifiers for the fast decision of splitting between adjacent sizes.
  • The above shows AK-CNN Network Architecture for 64×64.
  • Input: luminance of the current CU.
  • Output: The splitting decision, 0 or 1.
  • The network extends the first convolutional layer to three branches:
  • The first branch is the traditional one, with normal square convolutional kernels.
  • The remaining branches have kernels of asymmetric shape which target at detecting near-horizontal or near-vertical textures.
  • The above branches is similar to the idea in GoogLeNet with the use of non-square kernels.
  • All three convolutional branches output the feature maps with the same size. And they are concatenated. The combination of these features can help better understand the content.
  • Next, the features will flow through three fully-connected layers, including two hidden layers and one output layer.
  • LeakyReLU is used with α = 1. Softmax is used at output layer.
Image for post
Image for post
Network Architecture Details
  • Network A is for the block 32×32, while Network B is for 16×16 & 8×8.

2. Confidence Threshold Decision Scheme in PU Size Decision

2.1. Conference Version

Image for post
Image for post
Relationship between threshold and accuracy/ratios
  • For PU size decision, i.e. 8×8 to 4×4, a confidence threshold decision scheme is used.
  • The result of prediction accuracy and ratios of the subset with the change of threshold in 4 QPs in the validation set is shown above.
  • The value of softmax value thresholds according to the 90% accuracy points in the validation sets.
Image for post
Image for post
  • According to the accuracy, the thresholds, TH, are set as above.

2.2. Transaction Version

  • (More enhanced evolutionary theory is used in which I am not familiar with. Thus, if interested, please feel free to read the paper.)

3. Learned Fast Mode Decision (LFMD)

Image for post
Image for post
Minimum Number of RDO Candidates (MNRC)
  • In conventional HEVC intra coding, 3 to 8 intra direction predictions are used to perform rate-distortion optimization (RDO) to fine the best one.
  • In LFMD, AK-CNN is used to speed up the process by redefining the MNRC based on different classes predicted by AK-CNN, as shown above.
Image for post
Image for post
  • By adjusting p & q, different tradeoff can be obtained for the tradeoff between complexity reduction and coding performance.
  • And 2 types of AK-CNN is trained. One for conservative setting, one for agressive setting.
  • The AK-CNN is similar to the above one but with the output layer to have 3 outputs to classify Class 1 to Class 3. For 4×4, a more shallower network is used instead of AK-CNN.
  • A standard training loss is used:
Image for post
Image for post
  • (There is not much details for this part in this paper.)

4. Some Training Details

4.1. New Dataset

  • The CPH dataset in ETH-CNN & ETH-LSTM does not contain the information about the PU size decision.
  • For this reason, a complete dataset for extended CTU partition, namely ECP dataset, is established.

4.2. Loss Function for AK-CNN in LFSD

  • RD-cost is also collected for each block during encoding.
  • The coding loss as the relative difference between the RD-cost values before and after splitting processing (RD-cost of after splitting processing is the optimal combination):
Image for post
Image for post
  • Similar to the focal loss in RetinaNet, the training loss of block with LossRD lower than the threshold of this depth will be distributed with a fixed small weight w:
Image for post
Image for post
  • (This th is not the same as TH in the previous section.)
  • The neural networks will pay more attention to the block with large LossRD.

5. Experimental Results

5.1. BD-Rate in 2019 ISCAS

Image for post
Image for post
BD-Rate (BR) (%) & Time Saving (TS) (%) on HEVC Test Sequences
  • HM-16.9 is used under All-Intra configuration.
  • In average, 9.7% and 8.0% encoding time is saved additionally, compared with Liu TIP’16 [14] and ETH-CNN & ETH-LSTM [17].
  • The part of fast PU partition will further save 6.9% encoding time.

5.2. BD-Rate in 2020 TIP

Image for post
Image for post
BD-Rate (BD-BR) (%) & Time Saving (TS) (%) on HEVC Test Sequences
  • LFSD-1 has the most conservative performance, i.e. the lowest BD-rate loss, which outperforms [13], while LFSD-3 has the largest time saving.
  • LFSD-2 outperforms Liu TIP’16 [20] and ETH-CNN & ETH-LSTM [23] in terms of both BD-rate and time reduction.
Image for post
Image for post
BD-Rate (BD-BR) (%) & Time Saving (TS) (%) on HEVC Test Sequences
  • For LFMD, it outperforms [13] and [34].
  • By combining LFSD and LFMD to form LFHI, LFHI obtains 75.2% time reduction with only 2.09% increase in BD-rate.

5.3. Comparison with Square Kernels

Image for post
Image for post
Comparison with Square Kernels
  • Authors claim asymmetric kernel AK-CNNs have higher prediction accuracy than the AK-CNN using square kernel ones.
  • (But I can only see less than 1% accuracy improvement… BD-rate should be measured…)

5.4. Model Complexity

Image for post
Image for post
Model Sizes at Different CU Depth
  • The amount of parameters of our networks ranges from 43,346 to 43,98, which is fewer than AlexNet (60,965,128) and VGG-16 (138,357,544) by several orders of magnitude.
  • And the amount of parameters is also much fewer than Li ICME’17 [21].

There are still a lot of things have not shown here. Please feel free to read the papers.

This is the 8th story in this month!

Written by

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

Get the Medium app