Reading: Santamaria ICMEW’20 — Analytic Simplification of Neural Network based Intra-Prediction Modes (Fast VVC Intra)

Simplified Solution for NN-Based Intra Prediction, Much Lower Computational Complexity

In this story, Analytic Simplification of Neural Network based Intra-Prediction Modes for Video Compression (Santamaria ICMEW’20), by Queen Mary University of London, and British Broadcasting Corporation, is presented. I read this because I work on video coding research. In this paper:

  • A simplified solution for the conventional NN-based intra prediction is presented such that the computational complexity is reduced.

This is a paper in 2020 ICMEW. (Sik-Ho Tsang @ Medium)


  1. Conventional NN Intra Prediction in VVC
  2. Proposed Simplified Solution
  3. Experimental Results

1. Conventional NN Intra Prediction in VVC

Conventional NN Intra Prediction in VVC
  • NN-based intra-prediction method has demonstrated good performance within VVC when it is used alongside conventional intra-prediction modes.
  • While conventional intra modes use one line of reference samples, the NN-based model makes use of multiple reference lines.
NN Model in VVC
  • In particular, the chosen NN model forming the base approach of the analysis consists of four layers.
  • There are K modes in NN-based intra prediction.
  • Each mode is defined with common layers 1–3 and a mode-specific final layer.
  • The first 3 layers are the common layers for all K modes:
  • where ρ is exponential Linear Unit (eLU).
  • For each k-th mode, the prediction pk is computed as, at the last layer:
  • where pk is the predicted samples with the size of N×N block.
  • However, the computational complexity is high.

2. Proposed Simplified Solution

2.1. Linear Model without Intercept

Linear Model without Intercept
  • A linear model can be derived as a simplification of the model in the previous subsection, removing the non-linearities given by the eLU activation functions:
  • Due to the removal of activation functions and biases, the coefficients are normalised row-wise to make sure the final prediction samples assume values with energies comparable to the target block.
  • The prediction ^pk is obtained:
  • Each k-th NN-based predictor is simplified as a matrix A to estimate a target block from a set of reference samples r.
  • However, such model may not produce accurate predictions for pixels within the current block which are far from the reference samples.

2.2. Linear Model with Intercept

Linear Model with Intercept
  • Thus, it may be beneficial to further tune the prediction by introducing an intercept of the linear prediction model that does not depend on the reference samples:
  • where

3. Experimental Results

3.1. BD-Rate

  • Training was performed using 4×4, 8×8 and 16×16 blocks and 20,000 randomly selected luma patches derived from the DIVerse 2K (DIV2K) dataset.
  • K = 35 modes were used for each block size.
  • VTM-1.0 is used.
  • The maximum block-size was set to 16×16, where only square blocks were allowed.
  • pk: The base model using activation functions.
  • ^pk: The linear predictions without intercept.
  • ~pk: The linear predictors with intercept.
BD-Rate (%) & Time (%)
  • When compared with the NN-based model, complexity reduction was obtained, especially at the decoder side.
  • While the model without intercept produces some compression efficiency losses, minor losses are instead obtained with the model with intercept compared with NN-based model.

3.2. Mode Usage

Average mode usage per block size.
  • The mode usage when using the simplified approach with intercept is on average 2% higher than the mode usage when using the NN-based model in for smaller block sizes of 4×4 and 8×8.
  • Interestingly, in the NN-based model as well as in the linear prediction with intercept, some of the 35 modes were found to be used more often on average than others, showing that there are some predominant modes.

3.3. Computational Complexity

Number of multiplications required to generate a block.
  • The above table shows these values for the different block sizes supported.
  • The base model pk requires 4n×(√n+41)+32×(19√n+18) multiplications, where n is the number of samples within a block.
  • The proposed simplifications ^pk and ~pk require 8n(√n+2) multiplications.
  • In big O notation, the complexity of the proposed method is O(nn) which is much lower compared to the complexity of the NN-based model which is O(nn+n+√n).

This is the 2nd story in this month.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store