Reading: Li ICME’17 — Three-Branch Deep CNN for Complexity Reduction on Intra-Mode HEVC (Fast HEVC Prediction)

62.25% and 69.06% Average Time Reduction With Negligible BD-rate of 2.12% and 1.38%, Outperforms Liu ISCAS’16

Sik-Ho Tsang
4 min readMay 29, 2020

In this story, “A deep convolutional neural network approach for complexity reduction on intra-mode HEVC” (Li ICME’17), by Beihang University, and Imperial College London, is presented. I read this because I work on video coding research. In this paper:

  • Firstly, a large-scale database with diversiform patterns of CTU partition is established.
  • Secondly, the partition problem is modelled as a three-level classification problem.
  • Lastly, a deep CNN structure with various sizes of convolutional kernels is developed.

This is a paper in 2017 ICME. (Sik-Ho Tsang @ Medium)

Outline

  1. CTU Partition of Intra-mode HEVC (CPIH) Database
  2. Network Architecture
  3. Experimental Results

1. CTU Partition of Intra-mode HEVC (CPIH) Database

  • To the of the authors’ best knowledge, this database is the first one on CTU partition patterns. (https://github.com/HEVC-Projects/CPIH)
  • First, 2000 images at resolution 4928×3264 are selected from Raw Images Dataset (RAISE).
  • These 2000 images are randomly divided into training (1700 images), validation (100 images) and test (200 images) sets.
  • Furthermore, each set is equally divided into four subsets: one subset is with original resolution and the other three subsets are down-sampled to be 2880×1920, 1536×1024 and 768×512 to support different resolutions.
  • (For knowledge of video coding and HEVC, please feel free to read Sections 1 & 2 of IPCNN.)
  • All images are encoded by the HEVC reference software HM using four Quantization Parameters (QPs) of {22, 27, 32, 37}.
  • After encoding, the binary labels indicating splitting (=1) and non-splitting (=0) are obtained for all CUs.
  • Finally, 12 sub-databases are established according to QP and CU size, on account that 4 QPs are applied and CUs with 3 different sizes (64×64, 32×32 and 16×16).
  • The above table shows the details. In total, 110,405,784 samples are gathered, ensuring the sufficiency of training data, and the percentages of splitting and non-splitting CUs are 49.2% and 50.8%, respectively.

2. Network Architecture

Network Architecture
  • Three classifiers Sl are trained for different CU sizes of 64×64 (U), 32×32 (Ui) and 16×16 (Ui,j).
  • The only difference among the three separate CNN models is kernel sizes of the first convolutional layer, pertinent to different CU sizes.
  • Input layer: The input to one CNN model is the wl×wl matrices, where wl ∈ {64, 32, 16}.
  • Convolutional layers: For the 1-st convolutional layer, three branches of filters C1−1, C1−2 and C1−3 with kernel sizes of wl/8×wl/8 , wl/4×wl/4 and wl/2×wl/2 applied in parallel to extract low-level features of CU splitting. The stride is the same as the kernel which makes them non-overlap convolutions.
  • Following the 1-st convolutional layer, feature maps are half-scaled by convoluting with nonoverlapping 2×2 kernels, until the size of final feature maps reaching 2 × 2.
  • Other layers: All feature maps, yielded from the last convolutional layer, are concatenated together and then converted into a vector, through the concatenation layer.
  • This vector then goes through the fully-connected layers, including two hidden layers and one output layer, with dropout of 50% is used.
  • ReLU is used for all layers except Sigmoid is used at output layer since it is binary label.
  • The details for three classifiers are as shown above.
  • It is mentioned that Liu ISCAS’16 only got 1,224 trainable parameters which might cause underfitting while the proposed networks here increase the trainable parameters largely as shown above.
  • Standard cross entropy loss is used where R is mini-batch size:

3. Experimental Results

  • Test Sets: All 200 images of the testing set of our CPIH database, and all 18 video sequences of the Joint Collaborative Team on Video Coding (JCT-VC) standard test set.
Video Test Set
  • 60.91% to 67.20% average time reduction is achieved with 2.12 BD-rate increase, which outperforms SVM approach [13] and Liu ISCAS’16 [21].
CPIH Image Test Set
  • Similar results for CPIH image test set.
  • 64.86% to 73.10% average time reduction is achieved with 1.38 BD-rate increase, which outperforms SVM approach [13] and Liu ISCAS’16 [21].

During the days of coronavirus, A challenge of writing 30/35/40 stories again for this month has been accomplished. Let me challenge 45 stories!! This is the 42nd story in this month.. Thanks for visiting my story..

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet