# Reading: Santamaria ICMEW’20 — Analytic Simplification of Neural Network based Intra-Prediction Modes (Fast VVC Intra)

## Simplified Solution for NN-Based Intra Prediction, Much Lower Computational Complexity

In this story, **Analytic Simplification of Neural Network based Intra-Prediction Modes for Video Compression (Santamaria ICMEW’20)**, by Queen Mary University of London, and British Broadcasting Corporation, is presented. I read this because I work on video coding research. In this paper:

**A simplified solution for the conventional NN-based intra prediction**is presented such that the computational complexity is reduced.

This is a paper in **2020 ICMEW**. (Sik-Ho Tsang @ Medium)

# Outline

**Conventional NN Intra Prediction in VVC****Proposed Simplified Solution****Experimental Results**

**1. Conventional NN Intra Prediction in VVC**

- NN-based intra-prediction method has demonstrated good performance within VVC when it is used alongside conventional intra-prediction modes.
- While conventional intra modes use one line of reference samples, the NN-based model makes use of multiple reference lines.

- In particular, the chosen NN model forming the base approach of the analysis consists of four layers.
- There are
*K*modes - Each mode is defined with common layers 1–3 and a mode-specific final layer.
**The first 3 layers**are the common layers for all*K*modes:

- where
*ρ*is exponential Linear Unit (eLU). - For each
*k*-th mode, the prediction*pk*is computed as, at**the last layer**:

- where
*pk*is the predicted samples with the size of*N*×*N*block. - However, the computational complexity is high.

**2. Proposed Simplified Solution**

## 2.1. Linear Model without Intercept

**A linear model**can be derived as a simplification of the model in the previous subsection,**removing the non-linearities given by the eLU**activation functions:

- Due to the removal of activation functions and biases,
**the coefficients are normalised row-wise**to make sure the final prediction samples assume values with energies comparable to the target block. - The prediction ^
*pk*is obtained:

- Each
*k*-th NN-based predictor is simplified as a matrix*A*to estimate a target block from a set of reference samples*r*. - However, such model
**may not produce accurate predictions for pixels within the current block which are far from the reference samples.**

## 2.2. Linear Model with Intercept

- Thus, it may be beneficial to further tune the prediction by
**introducing an intercept**of the linear prediction model that does not depend on the reference samples:

- where

**3. Experimental Results**

## 3.1. BD-Rate

- Training was performed using 4×4, 8×8 and 16×16 blocks and 20,000 randomly selected luma patches derived from the DIVerse 2K (DIV2K) dataset.
*K*= 35 modes were used for each block size.- VTM-1.0 is used.
- The maximum block-size was set to 16×16, where only square blocks were allowed.
: The base model using activation functions.*pk***^**: The linear predictions without intercept.*pk***~**: The linear predictors with intercept.*pk*

- When compared with the NN-based model, complexity reduction was obtained, especially at the decoder side.
**While the model without intercept produces some compression efficiency losses, minor losses are instead obtained with the model with intercept compared with NN-based model.**

## 3.2. Mode Usage

- The mode usage when using the simplified approach with intercept is on average 2% higher than the mode usage when using the NN-based model in for smaller block sizes of 4×4 and 8×8.
- Interestingly, in the NN-based model as well as in the linear prediction with intercept, some of the 35 modes were found to be used more often on average than others, showing that there are some predominant modes.

## 3.3. Computational Complexity

- The above table shows these values for the different block sizes supported.
- The base model
*pk*requires 4*n*×(√*n*+41)+32×(19√*n*+18) multiplications, where*n*is the number of samples within a block. - The proposed simplifications ^
*pk*and ~*pk*require 8*n*(√*n*+2) multiplications. - In big O notation, the complexity of the proposed method is O(
*n*√*n*) which is much lower compared to the complexity of the NN-based model which is O(*n*√*n*+*n*+√*n*).

This is the 2nd story in this month.

## Reference

[2020 ICMEW] [Santamaria ICMEW’20]

Analytic Simplification of Neural Network based Intra-Prediction Modes for Video Compression

## Codec Intra Prediction

**JPEG** [MS-ROI] [Baig JVICU’17]**HEVC **[Xu VCIP’17] [Song VCIP’17] [Li VCIP’17] [Puri EUSIPCO’17] [IPCNN] [IPFCN] [HybridNN, Li ICIP’18] [Liu MMM’18] [CNNAC] [Li TCSVT’18] [Spatial RNN] [PS-RNN] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN] [CNNAC TCSVT’19] [CNN-CR] [CNNMC Yokoyama ICCE’20] [PNNS] [CNNCP]

**VVC**[CNNIF & CNNMC] [Brand PCS’19] [Bonnineau ICASSP’20] [Santamaria ICMEW’20]

## Codec Fast Prediction

**H.264 to HEVC** [Wei VCIP’17] [H-LSTM]**HEVC** [Yu ICIP’15 / Liu ISCAS’16 / Liu TIP’16] [Laude PCS’16] [Li ICME’17] [Katayama ICICT’18] [Chang DCC’18] [ETH-CNN & ETH-LSTM] [Zhang RCAR’19] [Kim TCVST’19] [LFHI & LFSD & LFMD Using AK-CNN] [Yang AICAS’20] [H-FCN]**3D-HEVC** [AQ-CNN] [CNN-SENet]**VP9** [H-FCN]**VVC **[Jin VCIP’17] [Jin PCM’17] [Jin ACCESS’18] [Wang ICIP’18] [Galpin DCC’19] [Pooling-Variable CNN] [Lin DCC’20] [Amna JRTIP’20] [DeepQTMT] [Santamaria ICMEW’20]