Review: Katayama ICICT’18 — Low Complexity Intra Coding Algorithm Based on CNN (Fast HEVC Prediction)

67.3% Time Reduction with Only 1.8% Increase in BD-Rate, Outperforms Liu TIP’16

3 min readMay 3, 2020

In this paper, Low Complexity Intra Coding Algorithm Based on Convolutional Neural Network for HEVC (Katayama ICICT’18), by Tokushima University, is reviewed. I read this because I work on video coding research. This is a paper in 2018 ICICT. (Sik-Ho Tsang @ Medium)

Outline

HEVC and Quad-Tree Coding
Network Architecture
Experimental Results

1. HEVC and Quad-Tree Coding

(To know what video coding is and what it is used for, please feel free to read Sections 1 & 2 in IPCNN.)
In HEVC, a frame is divided into non-overlapping blocks, called Coding Tree Units (CTUs). Each CTU has the size of 64×64. CTUs are encoded from top left to bottom right using raster scan order.
For each CTU, quad-tree coding is applied to divide the CTU into 4 smaller square coding units (CUs), from 64×64, 32×32, 16×16 down to 8×8. By comparing the cost of CUs at each CU level, different sizes of CUs are chosen to encode each CTU.
(8×8 CUcan be divided into four 4×4 Prediction Units (PUs), but this is not the focus in this story.)
Each CU is encoded by different approaches, such as intra prediction and inter prediction.
In this paper, authors focus on intra prediction only.
(Inter prediction in which similar blocks are found among frames to compress the frame in a more efficient way, which is also called motion estimation. But it will not be discussed here since this paper is not related to inter prediction.)

2. Network Architecture

As shown above, the input of CNN is the block partition patterns from 64×64 to 8×8, which are converted to 16×16 block.
The first layer is a 3×3 convolutional layer with 10 kernels. The size of the feature map is 8x8 and the convolution calculation is performed with zero-padding mode. The kernels in this layer are deemed as feature extractors.
The second layer performs the max pooling.
Similarly, the third and the fourth layers perform the convolution and max pooling.
The fifth and sixth layers use Fully Connected Layer (FCL) in which the parameter is 256 and 64 to each input.
The seventh layer concatenates FCL of each input with 256 parameters.
The eighth layer performs FCL in which the parameter is 64.
The output layer uses the Softmax units.
This CNN classifier outputs the optimal CTU division information.

3. Experimental Results

Several test sequences (100 frames) with picture size from Class 4K to Class B are used.
Fifty frames in Class B are selected as the training samples.
HM-16.7 with all-intra configuration is used.

**CU Complexity Reduction (CR) (%), Time Saving (TS) (%), and BD-rate (%) Against HM-16.9**

The proposed algorithm obtains 67.3% time saving with only 1.8% increase in BD-rate for luminance value.
Compared with Liu TIP’16, the proposed algorithm reduces TS by about 1.2% better than. Additionally, BD-rate -2.58% and BD-PSNR 0.08dB are improved by the proposed algorithm.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 4th story in this month. Thanks for visiting my story..

Reference

[2018 ICICT] [Katayama ICICT’18]
Low-Complexity Intra Coding Algorithm Based on Convolutional Neural Network for HEVC

Codec Fast Prediction

HEVC: [Yu ICIP’15 / Liu ISCAS’16 / Liu TIP’16] [Laude PCS’16] [Katayama ICICT’18]