# Reading: HybridNN Li ICIP’18 — Chroma Intra Prediction (HEVC Intra)

**3.1%, and** **2.0%** BD-rate reduction on **U**, **and** **V, Respectively**

In this story, **A Hybrid Neural Network for Chroma Intra Prediction (HybridNN, Li ICIP’18)**, by University of Science and Technology of China, is briefly presented. There is no name for the network, I just simply call it HybridNN. I read this because I work on video coding research. In this paper:

**A convolutional neural network**is to extract features from the**reconstructed luma samples of the current block.****A fully connected network**is to extract features from the**neighboring reconstructed luma and chroma samples.**

That’s why it is called hybrid neural network. This is a paper in **2018 ICIP**. (Sik-Ho Tsang @ Medium)

# Outline

**HybridNN: Network Architecture****Experimental Results**

**1. HybridNN: Network Architecture**

## 1.1. Convolutional Layers

- Takes a 32×32 YUV 4:2:0 block as an example.
**The luma samples are down-sampled by half to have same resolution as chroma ones then fed into the CNN**, as shown at the bottom branch in the above figure.- It goes through the first convolution
*C*1:

- For the second convolution
*C*2,**it is two grouped convolutions**:*C*21 and*C*22, with different sized kernels in order for effectively aggregating multi-scale information

- The third convolution
*C*3 is similar to*C*2 one but with different multi-scale kernel sizes. - Finally, the fourth convolution
*C*4 is responsible to output the predicted chroma.

## 1.2. Fully Connected Layers

**3 successive FC layers are used.**- The neighboring reconstructed luma samples are down-sampled by a factor of 2, with also the samples at the upper and left boundaries, in total 33 samples are used as input.
- Similarly, the neighboring reconstructed chroma samples at the upper and left boundaries, in total 33×2 samples, are also used as input. Thus the input consists of 99 samples.
- The output of the last fully connected layer is a 128-dimensional feature vector.

## 1.3. Fusion Layer

- The fusion layer integrates the neighboring information.
- First,
**the vector***F*3 is tiled into the matrix. - Then, the fusion is done by
**element-wise product.**

## 1.4. Summary

- The above table summarized the network in details.
- There are also trained networks for 4×4, 8×8 chroma but with different hyperparameters.
- This new chroma mode competes with the linear model (LM) one in HEVC and choose the best one which has the minimum rate-distortion cost.

# 2. Experimental Results

- DIV2K dataset is used for training.
- No QP-specific models are trained.
- HM-12.0 is used.

- On average 0.2%,
**3.1%, and****2.0%**BD-rate reduction on Y,**U**,**and****V**, respectively, is achieved as shown above. - It is shown that there is great improvement for chroma since this is an approach to improve the chroma coding efficiency.
- It is worth noting that
**the proposed method performs especially well on Classes A and B**, which we conjecture is due to the similar resolutions of the Classes A, B and the training images.

- Compared with [7] which is published in 2016 ICME, the HybridNN has much larger improvements on chroma.

- HybridNN is selected mostly for the
**regions with rich textures or structures.** - Also,
**HybridNN can be selected for quite large blocks**but LM is mostly used for smaller blocks.

I read this during office hour and write it after work. I will write one more tonight, lol.

This is the 10th story in this month!

## Reference

[2018 ICIP] [HybridNN, Li ICIP’18]

A Hybrid Neural Network for Chroma Intra Prediction

## Codec Intra Prediction

**JPEG** [MS-ROI]**HEVC **[Xu VCIP’17] [Song VCIP’17] [IPCNN] [IPFCN] [HybridNN, Li ICIP’18] [CNNAC] [Li TCSVT’18] [Spatial RNN] [PS-RNN] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN] [CNNMC Yokoyama ICCE’20] [PNNS]

**VVC**[CNNIF & CNNMC] [Brand PCS’19]