Review: Katayama ICICT’18 — Low Complexity Intra Coding Algorithm Based on CNN (Fast HEVC Prediction)

67.3% Time Reduction with Only 1.8% Increase in BD-Rate, Outperforms Liu TIP’16

Sik-Ho Tsang
3 min readMay 3, 2020

In this paper, Low Complexity Intra Coding Algorithm Based on Convolutional Neural Network for HEVC (Katayama ICICT’18), by Tokushima University, is reviewed. I read this because I work on video coding research. This is a paper in 2018 ICICT. (Sik-Ho Tsang @ Medium)


  1. HEVC and Quad-Tree Coding
  2. Network Architecture
  3. Experimental Results

1. HEVC and Quad-Tree Coding

Quad-Tree Coding
  • (To know what video coding is and what it is used for, please feel free to read Sections 1 & 2 in IPCNN.)
  • In HEVC, a frame is divided into non-overlapping blocks, called Coding Tree Units (CTUs). Each CTU has the size of 64×64. CTUs are encoded from top left to bottom right using raster scan order.
  • For each CTU, quad-tree coding is applied to divide the CTU into 4 smaller square coding units (CUs), from 64×64, 32×32, 16×16 down to 8×8. By comparing the cost of CUs at each CU level, different sizes of CUs are chosen to encode each CTU.
  • (8×8 CUcan be divided into four 4×4 Prediction Units (PUs), but this is not the focus in this story.)
  • Each CU is encoded by different approaches, such as intra prediction and inter prediction.
  • In this paper, authors focus on intra prediction only.
  • (Inter prediction in which similar blocks are found among frames to compress the frame in a more efficient way, which is also called motion estimation. But it will not be discussed here since this paper is not related to inter prediction.)

2. Network Architecture

Network Architecture
  • As shown above, the input of CNN is the block partition patterns from 64×64 to 8×8, which are converted to 16×16 block.
  • The first layer is a 3×3 convolutional layer with 10 kernels. The size of the feature map is 8x8 and the convolution calculation is performed with zero-padding mode. The kernels in this layer are deemed as feature extractors.
  • The second layer performs the max pooling.
  • Similarly, the third and the fourth layers perform the convolution and max pooling.
  • The fifth and sixth layers use Fully Connected Layer (FCL) in which the parameter is 256 and 64 to each input.
  • The seventh layer concatenates FCL of each input with 256 parameters.
  • The eighth layer performs FCL in which the parameter is 64.
  • The output layer uses the Softmax units.
  • This CNN classifier outputs the optimal CTU division information.

3. Experimental Results

  • Several test sequences (100 frames) with picture size from Class 4K to Class B are used.
  • Fifty frames in Class B are selected as the training samples.
  • HM-16.7 with all-intra configuration is used.
CU Complexity Reduction (CR) (%), Time Saving (TS) (%), and BD-rate (%) Against HM-16.9
  • The proposed algorithm obtains 67.3% time saving with only 1.8% increase in BD-rate for luminance value.
  • Compared with Liu TIP’16, the proposed algorithm reduces TS by about 1.2% better than. Additionally, BD-rate -2.58% and BD-PSNR 0.08dB are improved by the proposed algorithm.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 4th story in this month. Thanks for visiting my story..



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.