Reading: Santamaria ICMEW’20 — Analytic Simplification of Neural Network based Intra-Prediction Modes (Fast VVC Intra)

Simplified Solution for NN-Based Intra Prediction, Much Lower Computational Complexity

5 min readAug 1, 2020

In this story, Analytic Simplification of Neural Network based Intra-Prediction Modes for Video Compression (Santamaria ICMEW’20), by Queen Mary University of London, and British Broadcasting Corporation, is presented. I read this because I work on video coding research. In this paper:

A simplified solution for the conventional NN-based intra prediction is presented such that the computational complexity is reduced.

This is a paper in 2020 ICMEW. (Sik-Ho Tsang @ Medium)

Outline

Conventional NN Intra Prediction in VVC
Proposed Simplified Solution
Experimental Results

1. Conventional NN Intra Prediction in VVC

NN-based intra-prediction method has demonstrated good performance within VVC when it is used alongside conventional intra-prediction modes.
While conventional intra modes use one line of reference samples, the NN-based model makes use of multiple reference lines.

In particular, the chosen NN model forming the base approach of the analysis consists of four layers.
There are K modes in NN-based intra prediction.
Each mode is defined with common layers 1–3 and a mode-specific final layer.
The first 3 layers are the common layers for all K modes:

where ρ is exponential Linear Unit (eLU).
For each k-th mode, the prediction pk is computed as, at the last layer:

where pk is the predicted samples with the size of N×N block.
However, the computational complexity is high.

2. Proposed Simplified Solution

2.1. Linear Model without Intercept

Linear Model without Intercept

A linear model can be derived as a simplification of the model in the previous subsection, removing the non-linearities given by the eLU activation functions:

Due to the removal of activation functions and biases, the coefficients are normalised row-wise to make sure the final prediction samples assume values with energies comparable to the target block.
The prediction ^pk is obtained:

Each k-th NN-based predictor is simplified as a matrix A to estimate a target block from a set of reference samples r.
However, such model may not produce accurate predictions for pixels within the current block which are far from the reference samples.

2.2. Linear Model with Intercept

Thus, it may be beneficial to further tune the prediction by introducing an intercept of the linear prediction model that does not depend on the reference samples:

where

3. Experimental Results

3.1. BD-Rate

Training was performed using 4×4, 8×8 and 16×16 blocks and 20,000 randomly selected luma patches derived from the DIVerse 2K (DIV2K) dataset.
K = 35 modes were used for each block size.
VTM-1.0 is used.
The maximum block-size was set to 16×16, where only square blocks were allowed.
pk: The base model using activation functions.
^pk: The linear predictions without intercept.
~pk: The linear predictors with intercept.

When compared with the NN-based model, complexity reduction was obtained, especially at the decoder side.
While the model without intercept produces some compression efficiency losses, minor losses are instead obtained with the model with intercept compared with NN-based model.

3.2. Mode Usage

The mode usage when using the simplified approach with intercept is on average 2% higher than the mode usage when using the NN-based model in for smaller block sizes of 4×4 and 8×8.
Interestingly, in the NN-based model as well as in the linear prediction with intercept, some of the 35 modes were found to be used more often on average than others, showing that there are some predominant modes.

3.3. Computational Complexity

**Number of multiplications required to generate a block.**

The above table shows these values for the different block sizes supported.
The base model pk requires 4n×(√n+41)+32×(19√n+18) multiplications, where n is the number of samples within a block.
The proposed simplifications ^pk and ~pk require 8n(√n+2) multiplications.
In big O notation, the complexity of the proposed method is O(n√n) which is much lower compared to the complexity of the NN-based model which is O(n√n+n+√n).

This is the 2nd story in this month.

Reference

[2020 ICMEW] [Santamaria ICMEW’20]
Analytic Simplification of Neural Network based Intra-Prediction Modes for Video Compression

Codec Intra Prediction

JPEG [MS-ROI] [Baig JVICU’17]
HEVC [Xu VCIP’17] [Song VCIP’17] [Li VCIP’17] [Puri EUSIPCO’17] [IPCNN] [IPFCN] [HybridNN, Li ICIP’18] [Liu MMM’18] [CNNAC] [Li TCSVT’18] [Spatial RNN] [PS-RNN] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN] [CNNAC TCSVT’19] [CNN-CR] [CNNMC Yokoyama ICCE’20] [PNNS] [CNNCP]
VVC [CNNIF & CNNMC] [Brand PCS’19] [Bonnineau ICASSP’20] [Santamaria ICMEW’20]

Codec Fast Prediction

H.264 to HEVC [Wei VCIP’17] [H-LSTM]
HEVC [Yu ICIP’15 / Liu ISCAS’16 / Liu TIP’16] [Laude PCS’16] [Li ICME’17] [Katayama ICICT’18] [Chang DCC’18] [ETH-CNN & ETH-LSTM] [Zhang RCAR’19] [Kim TCVST’19] [LFHI & LFSD & LFMD Using AK-CNN] [Yang AICAS’20] [H-FCN]
3D-HEVC [AQ-CNN] [CNN-SENet]
VP9 [H-FCN]
VVC [Jin VCIP’17] [Jin PCM’17] [Jin ACCESS’18] [Wang ICIP’18] [Galpin DCC’19] [Pooling-Variable CNN] [Lin DCC’20] [Amna JRTIP’20] [DeepQTMT] [Santamaria ICMEW’20]