Review: AP-CNN — Angular Prediction CNN (HEVC Prediction)

U-Net Network Structure, 0.85% Bitrate Saving Compared to the Conventional Lossless HEVC

Sik-Ho Tsang
3 min readApr 24, 2020

In this story, Angular intra-Prediction Convolutional Neural Network (AP-CNN), by vrije universiteit brussel, is briefly reviewed. I read this because I work on video coding research. This paper is short, only got 1 page. This is a paper in 2019 DCC. (Sik-Ho Tsang @ Medium)

Outline

  1. Proposed AP-CNN
  2. Experimental Results

1. Proposed AP-CNN

1.1. CNN Replaces Conventional Intra

35 Conventional Intra Modes
  • The propose prediction scheme replaces a set of 9 angular intra-prediction modes with an improved CNN-based prediction.
  • Those 9 prediction mode indices are 2, 6, 10, 14, 18, 22, 26, 30, and 34, as shown above.
  • (To know more about video coding and intra prediction, please feel free to read Sections 1 & 2 in IPCNN.)
  • (In this paper, lossless coding is used meaning that the reconstructed block at the decoder is exactly the same as the original raw video, i.e. there are no quantizations to sacrifice video quality to reduce the coding bits.)

1.2. AP-CNN

  • A causal neighborhood selecting a 16 × 16 block around the currently predicted block is used as input.
  • AP-CNN is designed based on the U-Net architecture and operates on three resolutions (16×16, 8×8, 4×4).
  • AP-CNN contains the following U-Net structure:
  • 10 convolutional layers (2+2, 2+2, 2) with (32, 64, 128) filters. (The paper is short. I guess the numbers in the bracket stand for the number of convolution layers at the contraction path, expansion path and the convolutions before output respectively.)
  • 2 deconvolution layers with 32 and 64 filters.
  • 2 filter concatenation layers.
  • A final convolutional layer with one filter is used to compute a 16×16 block, which is further clipped out on the bottom-right corner to obtain the 4×4 output predicted block.
  • (As mentioned 16×16 and 8×8 blocks are also supported. I guess clip out is also done to obtain the 8×8 output predicted block while for 16×16 block, there is no clip out.)

1.3. Training

  • MSE loss function is used for training.
  • A model is trained for each of the 9 modes using a corresponding training set generated based on HEVC’s optimal mode segmentation applied to 15 HD sequences from Xiph.org and a collection of RGB images from NYC Library.
  • The size of each of the training sets varies between 6700 and 37300 batches, where one batch contains 500 samples (input blocks).
  • Each AP-CNN model was trained during 20 epochs, and using a 90%−10% ratio for training−validation data splitting.

2. Experimental Results

  • The experimental assessment is carried out on the Y channel of two datasets: 15 HEVC Test Sequences on 8-bit, and 7 TUT Sequences from Ultra Video Group (TUT-vSEQ).
Bits per pixel (Bpp) and Bitrate Change (%)
  • Using lossless HEVC intra coding, 3.448 bits per pixel (bpp) are obtained while using AP-CNN, 3.416 bpp is obtained.
  • AP-CNN outperforms Lossless HEVC with an average bitrate improvement of around 0.85%.
  • An increased performance is obtained on 1920×1080 (1080p) resolutions and above.

During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 22nd story in this month. Thanks for visiting my story…

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet