Reading: Puri EUSIPCO’17 — CNN-Based Transform Index Prediction (HEVC Intra Prediction)
LeNet-Like Network, 0.2% Average BD-Rate Gain & Up to 0.59% BD-Rate Gain
In this story, “CNN-Based Transform Index Prediction in Multiple Transforms Framework to Assist Entropy Coding” (Puri EUSIPCO’17), by Technicolor, Universit´e de Nantes, and IRCCyN, is briefly presented. I read this because I work on video coding research.
- Since HEVC uses multiple transforms, a transform index is needed to be encoded to indicate which transform is used.
- In this paper, a CNN-based approach is to predict the most probable transform so as to reduce the coding bits for the transform index.
This is a paper in 2017 EUSIPCO. (Sik-Ho Tsang @ Medium)
Outline
- Conventional Transform Index Coding
- Proposed CNN-Based Transform Index Coding
- Experimental Results
1. Conventional Transform Index Coding
1.1. MDTC [5]
- In [5], multiple transform competition scheme (MDTC) is proposed.
- A transform is selected among all transform based on the minimum rate-distortion (RD) cost during RD optimization.
- A transform index is coded to indicate the choice amongst N+1 transforms to the decoder for proper reconstruction of the block.
- This is done by first coding a flag that indicates whether the DCT/DST transform is used or not.
- If the flag stipulates it is not, the offline learned transforms are used and a fixed length coding is used.
- This scheme clearly favors the DCT/DST as it requires fewer bits to encode.
1.2. Fixed Length Coding
- An alternative way of signaling the transform choice would be to directly binarize the transform index using a fixed length coding, to indicate N+1 transform candidates on b bits where:
- These bits are entropy coded using CABAC.
- No flavor towards DCT/DST.
- It is used as baseline for comparison in the coming experimental result section.
2. Proposed CNN-Based Transform Index Coding
2.1. Overall Scheme
- A 4×4 luma residual block X is input.
- There are multiple transforms can be selected from T0 to TN.
- T0 is DST while others are offline learned transforms (T1 to TN).
- Each of them are tried to be transformed by Ti then quantized (Q).
- The quantized transformed coefficients are input into CNN.
- A vector p of probabilities of predicting a particular transform index i.
- The vector p is utilized to construct a truncated unary code which is simply done by rearranging the probabilities in p in the decreasing order and using minimum bits (1 bit) for the transform index that is predicted with highest probability and maximum bits (N bits) for least probable transform index.
- For example, when N=3, there are 4 transforms. Suppose T2 is selected, and the CNN output is [0.15, 0.1, 0.45, 0.30].
- By using the truncated entropy coding, 1 bit of ‘0’ is coded for the transform index.
- Another example, Suppose T0 is selected, and the CNN output is [0.30, 0.1, 0.45, 0.15].
- Then, 2 bits of ‘10’ are coded for the transform index.
2.2. Network Architecture
- 4×4 coefficient block is as input.
- The first convolutional layer takes coefficient block of size 4×4 as input and is passed through 32 filters of size 2×2 and a stride of one.
- The second convolution layer operates over the output of the first layer which uses 64 filters of size 2×2 and stride of one.
- A max-pooling layer is used to reduce the size to 2×2×64.
- This is then fed to the fully connected layers with 36 perceptron.
- The final softmax layer outputs the probabilities.
- Keras is used.
3. Experimental Results
3.1. Training
- HM-15.0 is used with All-Intra configuration.
- Training Set: Zurich Building dataset [16] which contains over 1000 images in PNG format that are converted to a YUV format of resolution 640×480.
- Only coefficient blocks with at-least 3 non-zero coefficients are considered.
- The coefficient blocks where the above and left samples are not available are not taken into account.
- Imbalanced classes are avoided by manually balancing the number of coefficients in each class.
- Four CNN-models are trained on the four major intra-prediction modes (IPM), namely DC, Planar, Vertical and Horizontal.
- Batch size of 32 is used and the number of iterations on the data set is set as 20.
3.2. Loss Curves
- Both training and validation loss are reduced during training.
3.3. BD-Rate
- Only the first frame is encoded.
- EP: encodes the bits b equi-probably (bypass mode).
- CTXT: utilizes entropy coding with CABAC context (regular mode) when coding the bits.
- NoIndex: Index not coded to show the upper bound.
- CNN: obtains largest coding gain of 1.76%.
- Similar for N=3, CNN outperforms EP and CTXT.
- An average gain of around 0.2% and a maximum gain up to 0.59% are achieved.
This is the 15th story in this month!
Reference
[2017 EUSIPCO] [Puri EUSIPCO’17]
CNN-Based Transform Index Prediction in Multiple Transforms Framework to Assist Entropy Coding
Codec Intra Prediction
JPEG [MS-ROI]
HEVC [Xu VCIP’17] [Song VCIP’17] [Li VCIP’17] [Puri EUSIPCO’17] [IPCNN] [IPFCN] [HybridNN, Li ICIP’18] [Liu MMM’18] [CNNAC] [Li TCSVT’18] [Spatial RNN] [PS-RNN] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN] [CNNMC Yokoyama ICCE’20] [PNNS]
VVC [CNNIF & CNNMC] [Brand PCS’19]