Reading: HybridNN Li ICIP’18 — Chroma Intra Prediction (HEVC Intra)

3.1%, and 2.0% BD-rate reduction on U, and V, Respectively

4 min readJun 9, 2020

In this story, A Hybrid Neural Network for Chroma Intra Prediction (HybridNN, Li ICIP’18), by University of Science and Technology of China, is briefly presented. There is no name for the network, I just simply call it HybridNN. I read this because I work on video coding research. In this paper:

A convolutional neural network is to extract features from the reconstructed luma samples of the current block.
A fully connected network is to extract features from the neighboring reconstructed luma and chroma samples.

That’s why it is called hybrid neural network. This is a paper in 2018 ICIP. (Sik-Ho Tsang @ Medium)

Outline

HybridNN: Network Architecture
Experimental Results

1. HybridNN: Network Architecture

1.1. Convolutional Layers

Takes a 32×32 YUV 4:2:0 block as an example.
The luma samples are down-sampled by half to have same resolution as chroma ones then fed into the CNN, as shown at the bottom branch in the above figure.
It goes through the first convolution C1:

For the second convolution C2, it is two grouped convolutions C21 and C22, with different sized kernels in order for effectively aggregating multi-scale information:

The third convolution C3 is similar to C2 one but with different multi-scale kernel sizes.
Finally, the fourth convolution C4 is responsible to output the predicted chroma.

1.2. Fully Connected Layers

3 successive FC layers are used.
The neighboring reconstructed luma samples are down-sampled by a factor of 2, with also the samples at the upper and left boundaries, in total 33 samples are used as input.
Similarly, the neighboring reconstructed chroma samples at the upper and left boundaries, in total 33×2 samples, are also used as input. Thus the input consists of 99 samples.
The output of the last fully connected layer is a 128-dimensional feature vector.

1.3. Fusion Layer

The fusion layer integrates the neighboring information.
First, the vector F3 is tiled into the matrix.
Then, the fusion is done by element-wise product.

1.4. Summary

The above table summarized the network in details.
There are also trained networks for 4×4, 8×8 chroma but with different hyperparameters.
This new chroma mode competes with the linear model (LM) one in HEVC and choose the best one which has the minimum rate-distortion cost.

2. Experimental Results

DIV2K dataset is used for training.
No QP-specific models are trained.
HM-12.0 is used.

On average 0.2%, 3.1%, and 2.0% BD-rate reduction on Y, U, and V, respectively, is achieved as shown above.
It is shown that there is great improvement for chroma since this is an approach to improve the chroma coding efficiency.
It is worth noting that the proposed method performs especially well on Classes A and B, which we conjecture is due to the similar resolutions of the Classes A, B and the training images.

Compared with [7] which is published in 2016 ICME, the HybridNN has much larger improvements on chroma.

HybridNN is selected mostly for the regions with rich textures or structures.
Also, HybridNN can be selected for quite large blocks but LM is mostly used for smaller blocks.

I read this during office hour and write it after work. I will write one more tonight, lol.
This is the 10th story in this month!

Reference

[2018 ICIP] [HybridNN, Li ICIP’18]
A Hybrid Neural Network for Chroma Intra Prediction

Codec Intra Prediction

JPEG [MS-ROI]
HEVC [Xu VCIP’17] [Song VCIP’17] [IPCNN] [IPFCN] [HybridNN, Li ICIP’18] [CNNAC] [Li TCSVT’18] [Spatial RNN] [PS-RNN] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN] [CNNMC Yokoyama ICCE’20] [PNNS]
VVC [CNNIF & CNNMC] [Brand PCS’19]