Reading: HybridNN Li ICIP’18 — Chroma Intra Prediction (HEVC Intra)
3.1%, and 2.0% BD-rate reduction on U, and V, Respectively
In this story, A Hybrid Neural Network for Chroma Intra Prediction (HybridNN, Li ICIP’18), by University of Science and Technology of China, is briefly presented. There is no name for the network, I just simply call it HybridNN. I read this because I work on video coding research. In this paper:
- A convolutional neural network is to extract features from the reconstructed luma samples of the current block.
- A fully connected network is to extract features from the neighboring reconstructed luma and chroma samples.
That’s why it is called hybrid neural network. This is a paper in 2018 ICIP. (Sik-Ho Tsang @ Medium)
Outline
- HybridNN: Network Architecture
- Experimental Results
1. HybridNN: Network Architecture
1.1. Convolutional Layers
- Takes a 32×32 YUV 4:2:0 block as an example.
- The luma samples are down-sampled by half to have same resolution as chroma ones then fed into the CNN, as shown at the bottom branch in the above figure.
- It goes through the first convolution C1:
- For the second convolution C2, it is two grouped convolutions C21 and C22, with different sized kernels in order for effectively aggregating multi-scale information:
- The third convolution C3 is similar to C2 one but with different multi-scale kernel sizes.
- Finally, the fourth convolution C4 is responsible to output the predicted chroma.
1.2. Fully Connected Layers
- 3 successive FC layers are used.
- The neighboring reconstructed luma samples are down-sampled by a factor of 2, with also the samples at the upper and left boundaries, in total 33 samples are used as input.
- Similarly, the neighboring reconstructed chroma samples at the upper and left boundaries, in total 33×2 samples, are also used as input. Thus the input consists of 99 samples.
- The output of the last fully connected layer is a 128-dimensional feature vector.
1.3. Fusion Layer
- The fusion layer integrates the neighboring information.
- First, the vector F3 is tiled into the matrix.
- Then, the fusion is done by element-wise product.
1.4. Summary
- The above table summarized the network in details.
- There are also trained networks for 4×4, 8×8 chroma but with different hyperparameters.
- This new chroma mode competes with the linear model (LM) one in HEVC and choose the best one which has the minimum rate-distortion cost.
2. Experimental Results
- DIV2K dataset is used for training.
- No QP-specific models are trained.
- HM-12.0 is used.
- On average 0.2%, 3.1%, and 2.0% BD-rate reduction on Y, U, and V, respectively, is achieved as shown above.
- It is shown that there is great improvement for chroma since this is an approach to improve the chroma coding efficiency.
- It is worth noting that the proposed method performs especially well on Classes A and B, which we conjecture is due to the similar resolutions of the Classes A, B and the training images.
- Compared with [7] which is published in 2016 ICME, the HybridNN has much larger improvements on chroma.
- HybridNN is selected mostly for the regions with rich textures or structures.
- Also, HybridNN can be selected for quite large blocks but LM is mostly used for smaller blocks.
I read this during office hour and write it after work. I will write one more tonight, lol.
This is the 10th story in this month!
Reference
[2018 ICIP] [HybridNN, Li ICIP’18]
A Hybrid Neural Network for Chroma Intra Prediction
Codec Intra Prediction
JPEG [MS-ROI]
HEVC [Xu VCIP’17] [Song VCIP’17] [IPCNN] [IPFCN] [HybridNN, Li ICIP’18] [CNNAC] [Li TCSVT’18] [Spatial RNN] [PS-RNN] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN] [CNNMC Yokoyama ICCE’20] [PNNS]
VVC [CNNIF & CNNMC] [Brand PCS’19]