Review: Song VCIP’17 — Neural Network Based Arithmetic Coding (HEVC Intra Prediction)

LeNet-Like Network, 9.9% Bit Saving for Intra Prediction Mode Compared with the Conventional HEVC CABAC

4 min readApr 21, 2020

In this story, Neural Network-Based Arithmetic Coding of Intra Prediction Modes in HEVC (Song VCIP’17), is reviewed. I read this because I work on video coding research. This is a paper in 2017 VCIP. (Sik-Ho Tsang @ Medium)

Outline

Introduction of Conventional CABAC in HEVC Video Coding
Network Architecture
Experimental Results

1. Introduction of Conventional CABAC in HEVC Video Coding

Context-adaptive binary arithmetic coding (CABAC) is used in HEVC for encoding the syntax elements (intermediate symbols) to 0s and 1s, i.e. the encoded bitstream before storage and transmission.
The encoding process of CABAC consists of three steps: binarization, context modeling, and binary arithmetic coding.
If the syntax element is not binary, the encoder will first map the element to a binary sequence. There are two coding modes: regular and bypass.
Regular mode: The probability model of the bin to be encoded is selected by the context, which refers to the previously encoded syntax elements. Then the bin and the selected context model is passed to the arithmetic coding engine, which not only encodes the bin, but also updates the corresponding probability distribution of the context model.
Bypass mode: All bins are encoded with the probability equal to 0.5.
However, both are manually designing binarization and context models.
In this paper, CNN is used to predicts the probability distribution of the syntax elements (intra prediction modes here).
(For those who wants to know what video coding is, please feel free to read Sections 1 & 2 in IPCNN.)

2. Network Architecture

2.1. Input

Since the current CU has not been coded, we cannot utilize the information from the current CU.
Instead, the three neighbor reconstructed areas (Rec) which are the same size of the current CU, as shown above, are available for both encoder and decoder.
Thus, these three neighbor reconstructed areas are concatenated and input into the network.

2.2. Network Architecture

LeNet-Like network architecture is used.
There are two convolutional layers in our network to process the input reconstructed blocks, each of which is followed by a max-pooling layer for down-sampling.
The first conv has the kernel size of 4×4 with 32 channels while the second conv has the kernel size of 4×4 with 64 channels.
The two max-pooling layers both use kernel size 2×2 with stride of 2.
The features after the second max-pooling layer are flattened and then mapped into a vector, which is concatenated with the input three one-hot vectors that represent MPMs (Most Probable Modes).
These three MPMs, are selected among 35 intra modes, which are suggested in HEVC by predefined rules which are supposed to have high probability to be used in the current CU.
One fully connected layer then gives out the final prediction.
F5 is 1024-dim. Thus, F is 1024–35×3=919-dim.
The loss function is:

where T is the ground-truth intra prediction mode.

2.3. CNN-based Arithmetic Coding

With the probability of the coding mode output from the CNN, we can use arithmetic coding module to encode the syntax element with the provision of probability by CNN. (Arithmetic coding is an entropy coding method to encode an element according to the probability.)
Multi-level arithmetic coding is used to avoid the binarization step.

3. Experimental Results

The proposed approach is implemented in 8×8 and 16×16 CU intra prediction mode coding respectively. The search of the best mode are the same for both proposed approach and the conventional HEVC. Thus, the results reflected can show the contribution only from the CNN-based arithmetic coding.

**Bits saving for the intra prediction mode only (not overall bitrate)**

The proposed CNN can help to reduce up to 9.9% and 9.5% bits saving for intra prediction mode for 8×8 and 16×16 CU respectively.

However, there are many other overheads in the HEVC video bitstream.
If overall bits are counted, only up to 1.6% and 0.5% of bits are saved. But this result is already impressive.

During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 18th story in this month. Thanks for visiting my story…

Reference

[2017 VCIP] [Song VCIP’17]
Neural Network-Based Arithmetic Coding of Intra Prediction Modes in HEVC

Codec Prediction

[CNNIF] [Xu VCIP’17] [Song VCIP’17] [IPCNN] [IPFCN] [NNIP] [Li TCSVT’18]