Review: Katayama ICICT’18 — Low Complexity Intra Coding Algorithm Based on CNN (Fast HEVC Prediction)
67.3% Time Reduction with Only 1.8% Increase in BD-Rate, Outperforms Liu TIP’16
In this paper, Low Complexity Intra Coding Algorithm Based on Convolutional Neural Network for HEVC (Katayama ICICT’18), by Tokushima University, is reviewed. I read this because I work on video coding research. This is a paper in 2018 ICICT. (Sik-Ho Tsang @ Medium)
- HEVC and Quad-Tree Coding
- Network Architecture
- Experimental Results
1. HEVC and Quad-Tree Coding
- (To know what video coding is and what it is used for, please feel free to read Sections 1 & 2 in IPCNN.)
- In HEVC, a frame is divided into non-overlapping blocks, called Coding Tree Units (CTUs). Each CTU has the size of 64×64. CTUs are encoded from top left to bottom right using raster scan order.
- For each CTU, quad-tree coding is applied to divide the CTU into 4 smaller square coding units (CUs), from 64×64, 32×32, 16×16 down to 8×8. By comparing the cost of CUs at each CU level, different sizes of CUs are chosen to encode each CTU.
- (8×8 CUcan be divided into four 4×4 Prediction Units (PUs), but this is not the focus in this story.)
- Each CU is encoded by different approaches, such as intra prediction and inter prediction.
- In this paper, authors focus on intra prediction only.
- (Inter prediction in which similar blocks are found among frames to compress the frame in a more efficient way, which is also called motion estimation. But it will not be discussed here since this paper is not related to inter prediction.)
2. Network Architecture
- As shown above, the input of CNN is the block partition patterns from 64×64 to 8×8, which are converted to 16×16 block.
- The first layer is a 3×3 convolutional layer with 10 kernels. The size of the feature map is 8x8 and the convolution calculation is performed with zero-padding mode. The kernels in this layer are deemed as feature extractors.
- The second layer performs the max pooling.
- Similarly, the third and the fourth layers perform the convolution and max pooling.
- The fifth and sixth layers use Fully Connected Layer (FCL) in which the parameter is 256 and 64 to each input.
- The seventh layer concatenates FCL of each input with 256 parameters.
- The eighth layer performs FCL in which the parameter is 64.
- The output layer uses the Softmax units.
- This CNN classifier outputs the optimal CTU division information.
3. Experimental Results
- Several test sequences (100 frames) with picture size from Class 4K to Class B are used.
- Fifty frames in Class B are selected as the training samples.
- HM-16.7 with all-intra configuration is used.
- The proposed algorithm obtains 67.3% time saving with only 1.8% increase in BD-rate for luminance value.
- Compared with Liu TIP’16, the proposed algorithm reduces TS by about 1.2% better than. Additionally, BD-rate -2.58% and BD-PSNR 0.08dB are improved by the proposed algorithm.
During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 4th story in this month. Thanks for visiting my story..
[2018 ICICT] [Katayama ICICT’18]
Low-Complexity Intra Coding Algorithm Based on Convolutional Neural Network for HEVC