Reading: IntraNN — CNNs for Video Intra Prediction Using Cross-component Adaptation (HEVC Intra)

2% BD-Rate Reduction for Luma and 1.5% BD-Rate Reduction for Chroma

4 min readMay 15, 2020

In this story, Convolutional Neural Networks for Video Intra Prediction Using Cross-component Adaptation (IntraNN), by RWTH Aachen University, is briefly described.

A new intra prediction mode is introduced by using CNN, which outperforms fully connected one. With the use of cross-component prediction (CRCO) in the CNN, a new IntraNN mode is introduced, and higher coding efficiency is achieved. This is a paper in 2019 ICASSP. (Sik-Ho Tsang @ Medium)

Outline

Network Architecture
Some Training Details
HEVC Implementation
Experimental Results

1. Network Architecture

1.1. Architecture Variants

Authors tried 3 versions of architecture, C0, C1 and C2.
C0: Only fully connected (FC) layers are used. That means no convolutions.
C1: Only one 4×4 convolutional layer before FC layers.
C2: As shown above, two convolutional layers, 3×3 then 2×2, before FC layers. As no padding is used, the number of reference lines limits the possible kernel sizes and combinations.
Leaky ReLU is used except the last output layer.
For the luma, it is shown as the upper branch.
For the chroma, it is shown as the lower branch where two chroma are concatenated before inputting into the network. And it is called cross-component (CRCO) prediction.
It is quite special that it is a L-shape input.

**BD-Rate (%) for Architecture Variants**

It is found that C2 obtains the largest coding gain of 2.01% BD-rate reduction.

2. Some Training Details

All networks are trained on samples from 104 sequences with varying resolutions. 11 additional videos are used as a validation set.
Authors have provided supplementary material about the sequences they use: http://www.ient.rwth-aachen.de/cms/icassp2019/
Horizontal and vertical flipping are applied for data augmentation.
Channel-wise mean of the reference area is subtracted from both the reference and the prediction area.
Some samples with low variance are excluded from training.
Two loss functions are tried: L1 and SATD. SATD is a Hadamard transformed sum of absolute difference. Simply speaking, this is a simplified transform which is commonly used in video coding for preliminarily checking of the the transformed residual SAD.

**Loss Functions Variants Using C0 Architecture**

It is found that SATD loss outperforms L0 one.

3. HEVC Implementation

HM-16.9 is used.
The above CNN-based prediction is treated as a new mode, the 36th intra prediction mode, called IntraNN.
(To know more about intra prediction, please feel free to read Sections 1 & 2 in IPCNN.)
The most probable mode list was extended to hold a fourth option by sending an additional bit when the third list position would have been chosen otherwise.
There are two methods to place the IntraNN in the intra prediction mode candidate list. One is END and one is UP.
END: It is always put in the last position. Thus, this END signaling version should cause less overhead when the IntraNN mode is not chosen.
UP: It is placed directly behind the modes used in the neighborhood. This UP version provides lower signaling costs for the IntraNN mode when in use.

4. Experimental Results

**BD-rate (%) compared to HEVC for the different signaling modes and architectures**

As shown above, in most cases, the UP variant gives slightly better BD-rate gains than the END version especially for the luma channel.
Cross-component (CRCO) version outperforms the version that not using network-based chroma prediction on every channel and sequence, on average by -0.6% on the luma and by -0.97% and -0.95% on the chroma channels.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 20th story in this month. 66.67% progress now, 1/3 way to go!! Thanks for visiting my story..

Reference

[2019 ICASSP] [IntraNN]
Convolutional Neural Networks for Video Intra Prediction Using Cross-component Adaptation

Codec Intra Prediction

HEVC [CNNIF] [Xu VCIP’17] [Song VCIP’17] [IPCNN] [IPFCN] [CNNAC] [Li TCSVT’18] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN]
VVC [Brand PCS’19]