Review: Laude PCS’16 — Deep Learning-Based Intra Prediction Mode Decision for HEVC (Fast HEVC Prediction)

Only 0.52% Increase in BD-Rate Compared with HM-16.6+SCM-5.2

Sik-Ho Tsang
3 min readApr 13, 2020

In this story, a deep Convolutional Neural Network (CNN) classifiers for fast HEVC intra coding is briefly reviewed. I read this paper because I work on video coding research. This is published in 2016 PCS. (Sik-Ho Tsang @ Medium)

Outline

  1. Conventional Intra Coding
  2. Network Architecture
  3. Experimental Results

1. Conventional Intra Coding

Quad-Tree Coding
  • A frame is divided into different sizes of non-overlapping blocks for encoding/compression.
  • These blocks are called Coding Units (CUs), which are from 64×64, 32×32, 16×16 down to 8×8.
35 Intra Predictions in HEVC (Left), Some Examples (Right)
  • For each CU in intra prediction, there are 35 predictions as shown above.
  • Neighbor reference samples are used to predict the current CU.
  • 0: planar, to predict smooth gradual change within the CU.
  • 1: DC, using the average value to fill in the CU as prediction.
  • 2–34: Angular, using different angles to predict the current CU.
  • Some examples are shown at the right of the figure.

However, it is time consuming to find the best prediction. Because we need to estimate the cost of each prediction which involves the coding rate (birate) and distortion (PSNR) of each prediction. This complicated process is called Rate Distortion Optimization (RDO).

2. Network Architecture

  • To reduce the complexity, a deep convolutional neural network (CNN) classifier is to replace the conventional RD optimization for the intra prediction mode.
CNN model for the classification of 32×32 blocks
  • The above is the model for 32×32 blocks.
  • The architecture is similar to AlexNet.
  • Each block is fed through two convolutional, one max pooling, and two fully-connected layers. In the final layer, a classification into 35 classes (i.e. intra prediction modes) is carried out.

3. Experimental Results

BD-Rate (%) for Each Sequence
  • HM-16.6+SCM-5.2 reference software is used.
  • Only 0.52% increase in BD-Rate.
  • However, time reduction is not provided. The target of this paper is to replace the RDO by CNN.
  • This is one of the early papers using CNN in the aspect of video coding.

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet