Reading: Zhang ICIP’19 — Dual-Input CNN-Based Interpolation Scheme (HEVC Inter Prediction)

0.9% BD-Rate Reduction, Outperforms Zhang VCIP’17

3 min readJun 14, 2020

In this story, “Advanced CNN Based Motion Compensation Fractional Interpolation” (Zhang ICIP’19), by Shanghai Jiao Tong University, and University of Missouri-Kansas City, is briefly presented. In this paper:

A dual-input CNN-Based Interpolation Scheme is designed where the inputs are the prediction and residual parts of reference blocks.

Finally, a better performance than their prior work in Zhang VCIP’17, is obtained. This is a paper in 2019 ICIP. (Sik-Ho Tsang @ Medium)

Outline

Dual-Input CNN Network Architecture
Experimental Results

1. Dual-Input CNN Network Architecture

In order to extract information from both prediction and residual parts of the relative reference block, separate convolution layers are used to handle them.
After several convolution layers, the feature maps of prediction and residual channels are concatenated to form the input of following layers.
The network is a fully convolutional network, and ReLU is adopted as the activation function. The convolutions use 64 filters with size 3×3.
The last layer used to combine previous feature map and generate output contains a single 3×3 filter.
Residual learning strategy is adopted.
Standard Euclidean loss is used:

The training data can be derived at the decoder side of HEVC directly.
Training data: two 4K sequences TrafficFlow, CampfireParty, and one test sequence of HEVC BlowingBubbles.
HM-16.7 is used under low delay P configuration.
Only half-pel (yellow) positions are predicted using CNN as shown below: