Reading: ScratchCNN — Low Complexity Learned Sub-Pixel Motion Compensation (VVC Inter)

Outperforms SRCNN-Like Network, Up to 4.5% BD-Rate Reduction

Sik-Ho Tsang
3 min readJul 8, 2020

In this story, Interpreting CNN for Low Complexity Learned Sub-Pixel Motion Compensation in Video Coding (ScratchCNN), by BBC Research and Development, and Dublin City University, is presented. I read this because I work on video coding research. In this paper:

  • CNN is used to improve the interpolation of reference samples needed for fractional precision motion compensation.
  • Complexity reduction of CNN is achieved by interpreting the interpolation filters learned by the networks.

This is a paper in 2020 ICIP which will be held in October, yet authors put the paper in arXiv. (Sik-Ho Tsang @ Medium)

(For the information of fractional interpolation in video coding, please feel free to read CNNIF.)

Outline

  1. Proposed ScratchCNN
  2. Experimental Results

1. Proposed ScratchCNN

1.1. ScratchCNN Network Architecture

ScratchCNN: Network Architecture
  • ScratchCNN network architecture is similar to SRCNN one.
  • It contains 64 individual 9×9 convolutional kernels in the first layer, 32 individual 1×1 kernels in the second layer, and 32 individual 5×5 kernels in the final layer.
  • Residual learning is used.
  • One special thing is that ReLU is removed.
  • Also, the bias is also removed.
  • No padding is applied .
  • SAD is used as loss function.

1.2. ScratchCNN Simplication for Complexity Reduction

Left: Interpolation in VVC, Middle: SRCNN-Like Network, Right: ScratchCNN After Simplification
  • With ReLU removed, a non-separable 2D filter M can be obtained:
  • Therefore, during inference, only M*X is computed, X no needs to go through 3 layers. As a result, interpolation process is speed up.

1.3. VVC Implementation

  • VTM-6.0 is used.
  • To support different CU sizes and shapes, 60 networks are trained.
  • The selection between the conventional VVC filters and the 13×13 filters is performed at a CU level.
  • The filters are only applied on luma samples.

2. Experimental Results

2.1. Ablation Study

BD-Rate (%) and Time (%) on Class D Sequences Under LDB Configurations
  • Using ScratchCNN with SAD as loss function and with no padding, BD-rate reduction is obtained.
  • Also, the encoding time and decoding time are much less compared to SRCNN.
  • Rather than integrating a deep learning software within VTM, all weights and biases (8129 parameters in total) are extracted from each of the 15 trained SRCNNs and implemented in VTM as a series of matrix multiplications.
  • In contrast, each trained ScratchCNN model is condensed in one 2D matrix that contains 169 parameters.

2.2. BD-Rate (%)

BD-Rate (%) on Class C & D Sequences
  • Using ScratchCNN, up to 4.54% BD-rate reduction can be obtained.

2.3. Hit Ratio

Hit Ratio (%) Under LDP Configuration
  • 70% to 80% of ScratchCNN usage showing that the ScratchCNN is useful.

This is the 2nd Story in this month.

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet