Reading: Li ICME’17 — Three-Branch Deep CNN for Complexity Reduction on Intra-Mode HEVC (Fast HEVC Prediction)

62.25% and 69.06% Average Time Reduction With Negligible BD-rate of 2.12% and 1.38%, Outperforms Liu ISCAS’16

4 min readMay 29, 2020

In this story, “A deep convolutional neural network approach for complexity reduction on intra-mode HEVC” (Li ICME’17), by Beihang University, and Imperial College London, is presented. I read this because I work on video coding research. In this paper:

Firstly, a large-scale database with diversiform patterns of CTU partition is established.
Secondly, the partition problem is modelled as a three-level classification problem.
Lastly, a deep CNN structure with various sizes of convolutional kernels is developed.

This is a paper in 2017 ICME. (Sik-Ho Tsang @ Medium)

Outline

CTU Partition of Intra-mode HEVC (CPIH) Database
Network Architecture
Experimental Results

1. CTU Partition of Intra-mode HEVC (CPIH) Database

To the of the authors’ best knowledge, this database is the first one on CTU partition patterns. (https://github.com/HEVC-Projects/CPIH)
First, 2000 images at resolution 4928×3264 are selected from Raw Images Dataset (RAISE).
These 2000 images are randomly divided into training (1700 images), validation (100 images) and test (200 images) sets.
Furthermore, each set is equally divided into four subsets: one subset is with original resolution and the other three subsets are down-sampled to be 2880×1920, 1536×1024 and 768×512 to support different resolutions.
(For knowledge of video coding and HEVC, please feel free to read Sections 1 & 2 of IPCNN.)
All images are encoded by the HEVC reference software HM using four Quantization Parameters (QPs) of {22, 27, 32, 37}.
After encoding, the binary labels indicating splitting (=1) and non-splitting (=0) are obtained for all CUs.
Finally, 12 sub-databases are established according to QP and CU size, on account that 4 QPs are applied and CUs with 3 different sizes (64×64, 32×32 and 16×16).
The above table shows the details. In total, 110,405,784 samples are gathered, ensuring the sufficiency of training data, and the percentages of splitting and non-splitting CUs are 49.2% and 50.8%, respectively.

2. Network Architecture

Three classifiers Sl are trained for different CU sizes of 64×64 (U), 32×32 (Ui) and 16×16 (Ui,j).
The only difference among the three separate CNN models is kernel sizes of the first convolutional layer, pertinent to different CU sizes.
Input layer: The input to one CNN model is the wl×wl matrices, where wl ∈ {64, 32, 16}.
Convolutional layers: For the 1-st convolutional layer, three branches of filters C1−1, C1−2 and C1−3 with kernel sizes of wl/8×wl/8 , wl/4×wl/4 and wl/2×wl/2 applied in parallel to extract low-level features of CU splitting. The stride is the same as the kernel which makes them non-overlap convolutions.
Following the 1-st convolutional layer, feature maps are half-scaled by convoluting with nonoverlapping 2×2 kernels, until the size of final feature maps reaching 2 × 2.
Other layers: All feature maps, yielded from the last convolutional layer, are concatenated together and then converted into a vector, through the concatenation layer.
This vector then goes through the fully-connected layers, including two hidden layers and one output layer, with dropout of 50% is used.
ReLU is used for all layers except Sigmoid is used at output layer since it is binary label.

The details for three classifiers are as shown above.
It is mentioned that Liu ISCAS’16 only got 1,224 trainable parameters which might cause underfitting while the proposed networks here increase the trainable parameters largely as shown above.
Standard cross entropy loss is used where R is mini-batch size:

3. Experimental Results

Test Sets: All 200 images of the testing set of our CPIH database, and all 18 video sequences of the Joint Collaborative Team on Video Coding (JCT-VC) standard test set.

60.91% to 67.20% average time reduction is achieved with 2.12 BD-rate increase, which outperforms SVM approach [13] and Liu ISCAS’16 [21].

Similar results for CPIH image test set.
64.86% to 73.10% average time reduction is achieved with 1.38 BD-rate increase, which outperforms SVM approach [13] and Liu ISCAS’16 [21].

During the days of coronavirus, A challenge of writing 30/35/40 stories again for this month has been accomplished. Let me challenge 45 stories!! This is the 42nd story in this month.. Thanks for visiting my story..

Reference

[2017 ICME] [Li ICME’17]
A deep convolutional neural network approach for complexity reduction on intra-mode HEVC

Codec Fast Prediction

H.264 to HEVC [Wei VCIP’17] [H-LSTM]
HEVC [Yu ICIP’15 / Liu ISCAS’16 / Liu TIP’16] [Laude PCS’16] [Li ICME’17] [Katayama ICICT’18] [Chang DCC’18] [Zhang RCAR’19]
VVC [Jin VCIP’17] [Jin PCM’17] [Wang ICIP’18] [Pooling-Variable CNN]