Reading: Li ICME’17 — Three-Branch Deep CNN for Complexity Reduction on Intra-Mode HEVC (Fast HEVC Prediction)
62.25% and 69.06% Average Time Reduction With Negligible BD-rate of 2.12% and 1.38%, Outperforms Liu ISCAS’16
In this story, “A deep convolutional neural network approach for complexity reduction on intra-mode HEVC” (Li ICME’17), by Beihang University, and Imperial College London, is presented. I read this because I work on video coding research. In this paper:
- Firstly, a large-scale database with diversiform patterns of CTU partition is established.
- Secondly, the partition problem is modelled as a three-level classification problem.
- Lastly, a deep CNN structure with various sizes of convolutional kernels is developed.
This is a paper in 2017 ICME. (Sik-Ho Tsang @ Medium)
Outline
- CTU Partition of Intra-mode HEVC (CPIH) Database
- Network Architecture
- Experimental Results
1. CTU Partition of Intra-mode HEVC (CPIH) Database
- To the of the authors’ best knowledge, this database is the first one on CTU partition patterns. (https://github.com/HEVC-Projects/CPIH)
- First, 2000 images at resolution 4928×3264 are selected from Raw Images Dataset (RAISE).
- These 2000 images are randomly divided into training (1700 images), validation (100 images) and test (200 images) sets.
- Furthermore, each set is equally divided into four subsets: one subset is with original resolution and the other three subsets are down-sampled to be 2880×1920, 1536×1024 and 768×512 to support different resolutions.
- (For knowledge of video coding and HEVC, please feel free to read Sections 1 & 2 of IPCNN.)
- All images are encoded by the HEVC reference software HM using four Quantization Parameters (QPs) of {22, 27, 32, 37}.
- After encoding, the binary labels indicating splitting (=1) and non-splitting (=0) are obtained for all CUs.
- Finally, 12 sub-databases are established according to QP and CU size, on account that 4 QPs are applied and CUs with 3 different sizes (64×64, 32×32 and 16×16).
- The above table shows the details. In total, 110,405,784 samples are gathered, ensuring the sufficiency of training data, and the percentages of splitting and non-splitting CUs are 49.2% and 50.8%, respectively.
2. Network Architecture
- Three classifiers Sl are trained for different CU sizes of 64×64 (U), 32×32 (Ui) and 16×16 (Ui,j).
- The only difference among the three separate CNN models is kernel sizes of the first convolutional layer, pertinent to different CU sizes.
- Input layer: The input to one CNN model is the wl×wl matrices, where wl ∈ {64, 32, 16}.
- Convolutional layers: For the 1-st convolutional layer, three branches of filters C1−1, C1−2 and C1−3 with kernel sizes of wl/8×wl/8 , wl/4×wl/4 and wl/2×wl/2 applied in parallel to extract low-level features of CU splitting. The stride is the same as the kernel which makes them non-overlap convolutions.
- Following the 1-st convolutional layer, feature maps are half-scaled by convoluting with nonoverlapping 2×2 kernels, until the size of final feature maps reaching 2 × 2.
- Other layers: All feature maps, yielded from the last convolutional layer, are concatenated together and then converted into a vector, through the concatenation layer.
- This vector then goes through the fully-connected layers, including two hidden layers and one output layer, with dropout of 50% is used.
- ReLU is used for all layers except Sigmoid is used at output layer since it is binary label.
- The details for three classifiers are as shown above.
- It is mentioned that Liu ISCAS’16 only got 1,224 trainable parameters which might cause underfitting while the proposed networks here increase the trainable parameters largely as shown above.
- Standard cross entropy loss is used where R is mini-batch size:
3. Experimental Results
- Test Sets: All 200 images of the testing set of our CPIH database, and all 18 video sequences of the Joint Collaborative Team on Video Coding (JCT-VC) standard test set.
- 60.91% to 67.20% average time reduction is achieved with 2.12 BD-rate increase, which outperforms SVM approach [13] and Liu ISCAS’16 [21].
- Similar results for CPIH image test set.
- 64.86% to 73.10% average time reduction is achieved with 1.38 BD-rate increase, which outperforms SVM approach [13] and Liu ISCAS’16 [21].
During the days of coronavirus, A challenge of writing 30/35/40 stories again for this month has been accomplished. Let me challenge 45 stories!! This is the 42nd story in this month.. Thanks for visiting my story..
Reference
[2017 ICME] [Li ICME’17]
A deep convolutional neural network approach for complexity reduction on intra-mode HEVC
Codec Fast Prediction
H.264 to HEVC [Wei VCIP’17] [H-LSTM]
HEVC [Yu ICIP’15 / Liu ISCAS’16 / Liu TIP’16] [Laude PCS’16] [Li ICME’17] [Katayama ICICT’18] [Chang DCC’18] [Zhang RCAR’19]
VVC [Jin VCIP’17] [Jin PCM’17] [Wang ICIP’18] [Pooling-Variable CNN]