Reading: HybridNN Li ICIP’18 — Chroma Intra Prediction (HEVC Intra)

3.1%, and 2.0% BD-rate reduction on U, and V, Respectively

Sik-Ho Tsang
4 min readJun 9, 2020

In this story, A Hybrid Neural Network for Chroma Intra Prediction (HybridNN, Li ICIP’18), by University of Science and Technology of China, is briefly presented. There is no name for the network, I just simply call it HybridNN. I read this because I work on video coding research. In this paper:

  • A convolutional neural network is to extract features from the reconstructed luma samples of the current block.
  • A fully connected network is to extract features from the neighboring reconstructed luma and chroma samples.

That’s why it is called hybrid neural network. This is a paper in 2018 ICIP. (Sik-Ho Tsang @ Medium)

Outline

  1. HybridNN: Network Architecture
  2. Experimental Results

1. HybridNN: Network Architecture

HybridNN: Network Architecture

1.1. Convolutional Layers

  • Takes a 32×32 YUV 4:2:0 block as an example.
  • The luma samples are down-sampled by half to have same resolution as chroma ones then fed into the CNN, as shown at the bottom branch in the above figure.
  • It goes through the first convolution C1:
  • For the second convolution C2, it is two grouped convolutions C21 and C22, with different sized kernels in order for effectively aggregating multi-scale information:
  • The third convolution C3 is similar to C2 one but with different multi-scale kernel sizes.
  • Finally, the fourth convolution C4 is responsible to output the predicted chroma.

1.2. Fully Connected Layers

  • 3 successive FC layers are used.
  • The neighboring reconstructed luma samples are down-sampled by a factor of 2, with also the samples at the upper and left boundaries, in total 33 samples are used as input.
  • Similarly, the neighboring reconstructed chroma samples at the upper and left boundaries, in total 33×2 samples, are also used as input. Thus the input consists of 99 samples.
  • The output of the last fully connected layer is a 128-dimensional feature vector.

1.3. Fusion Layer

  • The fusion layer integrates the neighboring information.
  • First, the vector F3 is tiled into the matrix.
  • Then, the fusion is done by element-wise product.

1.4. Summary

  • The above table summarized the network in details.
  • There are also trained networks for 4×4, 8×8 chroma but with different hyperparameters.
  • This new chroma mode competes with the linear model (LM) one in HEVC and choose the best one which has the minimum rate-distortion cost.

2. Experimental Results

  • DIV2K dataset is used for training.
  • No QP-specific models are trained.
  • HM-12.0 is used.
BD-Rate (%) on HEVC Test Sequences
  • On average 0.2%, 3.1%, and 2.0% BD-rate reduction on Y, U, and V, respectively, is achieved as shown above.
  • It is shown that there is great improvement for chroma since this is an approach to improve the chroma coding efficiency.
  • It is worth noting that the proposed method performs especially well on Classes A and B, which we conjecture is due to the similar resolutions of the Classes A, B and the training images.
  • Compared with [7] which is published in 2016 ICME, the HybridNN has much larger improvements on chroma.
Blue: HybridNN, Red: LM
  • HybridNN is selected mostly for the regions with rich textures or structures.
  • Also, HybridNN can be selected for quite large blocks but LM is mostly used for smaller blocks.

I read this during office hour and write it after work. I will write one more tonight, lol.

This is the 10th story in this month!

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet