Reading: LFHI & LFSD & LFMD Using AK-CNN — Asymmetric-Kernel CNN (Fast HEVC Prediction)

Outperforms Liu TIP’16 and ETH-CNN & ETH-LSTM, Smaller Model Size Than Li ICME’17, 75.2% Time Reduction Under All-Intra Configuration

6 min readJun 7, 2020

In this paper, Asymmetric-Kernel CNN (AK-CNN), by University of Science and Technology of China, is presented. In this paper:

Asymmetric horizontal and vertical convolution kernels are designed to precisely extract the texture features of each block with much lower complexity.
The confidence threshold decision scheme is designed in the PU partition part to achieve the best trade-off between the coding performance and complexity reduction.
This AK-CNN is used for fast size decision as well as fast mode decision, namely Learned Fast Size Decision (LFSD) and Learned Fast Mode Decision (LFMD), respectively. With both used, it becomes a Learned Fast HEVC Intra (LFHI) Coding Scheme.

LFSD is firstly published in 2019 ISCAS and then combining with LFMD to form LFHI, it is published in 2020 TIP where TIP has a high impact factor of 6.79. (Sik-Ho Tsang @ Medium)

Outline

AK-CNN: Network Architecture for Learned Fast Size Decision (LFSD)
Confidence Threshold Decision Scheme in PU Size Decision
Learned Fast Mode Decision (LFMD)
Some Training Details
Experimental Results for LFHI & LFSD & LFMD

1. AK-CNN: Network Architecture for Learned Fast Size Decision (LFSD)

**AK-CNN: Network Architecture for 64×64**

In HEVC, the coding block can be have the size of 64×64, 32×32, 16×16, 8×8 and 4×4. There are 4 AK-CNN classifiers for the fast decision of splitting between adjacent sizes.
The above shows AK-CNN Network Architecture for 64×64.
Input: luminance of the current CU.
Output: The splitting decision, 0 or 1.
The network extends the first convolutional layer to three branches:
The first branch is the traditional one, with normal square convolutional kernels.
The remaining branches have kernels of asymmetric shape which target at detecting near-horizontal or near-vertical textures.
The above branches is similar to the idea in GoogLeNet with the use of non-square kernels.
All three convolutional branches output the feature maps with the same size. And they are concatenated. The combination of these features can help better understand the content.
Next, the features will flow through three fully-connected layers, including two hidden layers and one output layer.
LeakyReLU is used with α = 1. Softmax is used at output layer.

Network A is for the block 32×32, while Network B is for 16×16 & 8×8.

2. Confidence Threshold Decision Scheme in PU Size Decision

2.1. Conference Version

**Relationship between threshold and accuracy/ratios**

For PU size decision, i.e. 8×8 to 4×4, a confidence threshold decision scheme is used.
The result of prediction accuracy and ratios of the subset with the change of threshold in 4 QPs in the validation set is shown above.
The value of softmax value thresholds according to the 90% accuracy points in the validation sets.

According to the accuracy, the thresholds, TH, are set as above.

2.2. Transaction Version

(More enhanced evolutionary theory is used in which I am not familiar with. Thus, if interested, please feel free to read the paper.)

3. Learned Fast Mode Decision (LFMD)

**Minimum Number of RDO Candidates (MNRC)**

In conventional HEVC intra coding, 3 to 8 intra direction predictions are used to perform rate-distortion optimization (RDO) to fine the best one.
In LFMD, AK-CNN is used to speed up the process by redefining the MNRC based on different classes predicted by AK-CNN, as shown above.

By adjusting p & q, different tradeoff can be obtained for the tradeoff between complexity reduction and coding performance.
And 2 types of AK-CNN is trained. One for conservative setting, one for agressive setting.
The AK-CNN is similar to the above one but with the output layer to have 3 outputs to classify Class 1 to Class 3. For 4×4, a more shallower network is used instead of AK-CNN.
A standard training loss is used:

(There is not much details for this part in this paper.)

4. Some Training Details

4.1. New Dataset

The CPH dataset in ETH-CNN & ETH-LSTM does not contain the information about the PU size decision.
For this reason, a complete dataset for extended CTU partition, namely ECP dataset, is established.

4.2. Loss Function for AK-CNN in LFSD

RD-cost is also collected for each block during encoding.
The coding loss as the relative difference between the RD-cost values before and after splitting processing (RD-cost of after splitting processing is the optimal combination):

Similar to the focal loss in RetinaNet, the training loss of block with LossRD lower than the threshold of this depth will be distributed with a fixed small weight w:

(This th is not the same as TH in the previous section.)
The neural networks will pay more attention to the block with large LossRD.

5. Experimental Results

5.1. BD-Rate in 2019 ISCAS

**BD-Rate (BR) (%) & Time Saving (TS) (%) on HEVC Test Sequences**

HM-16.9 is used under All-Intra configuration.
In average, 9.7% and 8.0% encoding time is saved additionally, compared with Liu TIP’16 [14] and ETH-CNN & ETH-LSTM [17].
The part of fast PU partition will further save 6.9% encoding time.

5.2. BD-Rate in 2020 TIP

**BD-Rate (BD-BR) (%) & Time Saving (TS) (%) on HEVC Test Sequences**

LFSD-1 has the most conservative performance, i.e. the lowest BD-rate loss, which outperforms [13], while LFSD-3 has the largest time saving.
LFSD-2 outperforms Liu TIP’16 [20] and ETH-CNN & ETH-LSTM [23] in terms of both BD-rate and time reduction.

For LFMD, it outperforms [13] and [34].
By combining LFSD and LFMD to form LFHI, LFHI obtains 75.2% time reduction with only 2.09% increase in BD-rate.

5.3. Comparison with Square Kernels

Authors claim asymmetric kernel AK-CNNs have higher prediction accuracy than the AK-CNN using square kernel ones.
(But I can only see less than 1% accuracy improvement… BD-rate should be measured…)

5.4. Model Complexity

The amount of parameters of our networks ranges from 43,346 to 43,98, which is fewer than AlexNet (60,965,128) and VGG-16 (138,357,544) by several orders of magnitude.
And the amount of parameters is also much fewer than Li ICME’17 [21].

There are still a lot of things have not shown here. Please feel free to read the papers.
This is the 8th story in this month!

References

[2019 ISCAS] [AK-CNN]
Asymmetric-Kernel CNN Based Fast CTU Partition for HEVC Intra Coding

[2020 TIP] [LFHI & LFSD & LFMD]
Learned Fast HEVC Intra Coding

Codec Fast Prediction

H.264 to HEVC [Wei VCIP’17] [H-LSTM]
HEVC [Yu ICIP’15 / Liu ISCAS’16 / Liu TIP’16] [Laude PCS’16] [Li ICME’17] [Katayama ICICT’18] [Chang DCC’18] [ETH-CNN & ETH-LSTM] [Zhang RCAR’19] [Kim TCVST’19] [LFHI & LFSD & LFMD Using AK-CNN]
3D-HEVC [AQ-CNN]
VVC [Jin VCIP’17] [Jin PCM’17] [Wang ICIP’18] [Pooling-Variable CNN]