Reading: Li VCIP’17 — CNN-Based Rate Control (HEVC Intra Prediction)

Outperforms the Conventional SATD-Based Approach

4 min readJun 12, 2020

In this story, “A Convolutional Neural Network-Based Approach to Rate Control in HEVC Intra Coding” (Li VCIP’17), by University of Science and Technology of China, and Microsoft Research Asia (MSRA), is briefly presented. I read this because I work on video coding research.

Rate Control: is an algorithm to control the bit allocation during the encoding process so that the coding bitrate is close to the target bitrate. This is essential since video content varies the bitrate so much. If there is burst bitrate, real-time streaming may not be realized.

In this paper, a CNN-based rate control approach is proposed. This is a paper in 2017 VCIP. (Sik-Ho Tsang @ Medium)

Outline

Conventional Rate Control (RC)
Proposed CNN-Based Rate Control
Experimental Results

1. Conventional Rate Control (RC)

1.1. Basic RC

The basic R-λ model:

where α and β are the model parameters that are dependent on the content. The Lagrange multiplier λ is the most critical factor to determine the rate R.
The model parameters of inter frames can be accurately estimated from those of previously coded frames.
But the model has difficulty to deal with intra frames because the parameters are not easy to estimate.

1.2. SATD Based RC

Content dependent parameter C is introduced:

Specifically, C is estimated based on the sum of absolutely transformed difference (SATD) of pixel values where SATD can somehow reflect the content complexity.
With higher content complexity, SATD usually is higher, vice versa.
This model is currently used in the HEVC reference software.

2. Proposed CNN-Based Rate Control

2.1. Network Architecture

Four convolutional layers, each followed by a rectified linear unit (ReLU).
Two max-pooling layers and three full-connection (fc) layers are used.
The final fc layer outputs a predicted value for a model parameter (α or β). The same structure is used to predict α or β, but is separately trained for either.

2.2. Training

UCID dataset and the RAISE dataset are used for training.
First, the natural images are converted into YUV420.
Then, they are compressed with the HEVC reference software under 11 different quantization parameters (QPs), ranging from 20 to 40 with an interval of 2.
The coding rate and Lagrange multiplier values of different QPs are collected for each CTU.
Then curve fitting is performed for each CTU using the 11 pairs of (R, λ) to achieve α and β. Outlier CTUs are then removed, where the inliers are defined as α ∈ [0.05, 200] and β ∈ [−3, 0].
The original pixel values of the luma component of each CTU are input to CNN, and use the corresponding α or β as label for training CNN.
There are 180,000 CTUs used for training, and another 16,000 CTUs used for validation.
Standard loss function is used:

2.3. Testing

The original pixel values of the luma component of each CTU are input to CNN to predict its model parameters.
It is worth noting that CTUs can be processed in parallel in the prediction process.
For boundary CUs that are smaller than CTU size, there is a rectification factor to proportionally to rectify α. (But I don’t focus on this.)
With target bitrate for a frame Rf given, by using basic unit (BU) bit allocation [4], different λ values are assigned to different CTUs to achieve a bitrate that is close to the target bitrate, and hopefully they are equal:

3. Experimental Results

HM-16.9 is used.

Rate control error is to see the percentage error between the actual bitrate and the target bitrate.
The rate control error by CNN is smaller than the one in HM-16.9.