Reading: Zhang ICIP’19 — Dual-Input CNN-Based Interpolation Scheme (HEVC Inter Prediction)

0.9% BD-Rate Reduction, Outperforms Zhang VCIP’17

Sik-Ho Tsang
3 min readJun 14, 2020


In this story, “Advanced CNN Based Motion Compensation Fractional Interpolation” (Zhang ICIP’19), by Shanghai Jiao Tong University, and University of Missouri-Kansas City, is briefly presented. In this paper:

  • A dual-input CNN-Based Interpolation Scheme is designed where the inputs are the prediction and residual parts of reference blocks.

Finally, a better performance than their prior work in Zhang VCIP’17, is obtained. This is a paper in 2019 ICIP. (Sik-Ho Tsang @ Medium)


  1. Dual-Input CNN Network Architecture
  2. Experimental Results

1. Dual-Input CNN Network Architecture

Dual-Input CNN Network Architecture
  • In order to extract information from both prediction and residual parts of the relative reference block, separate convolution layers are used to handle them.
  • After several convolution layers, the feature maps of prediction and residual channels are concatenated to form the input of following layers.
  • The network is a fully convolutional network, and ReLU is adopted as the activation function. The convolutions use 64 filters with size 3×3.
  • The last layer used to combine previous feature map and generate output contains a single 3×3 filter.
  • Residual learning strategy is adopted.
  • Standard Euclidean loss is used:
  • The training data can be derived at the decoder side of HEVC directly.
  • Training data: two 4K sequences TrafficFlow, CampfireParty, and one test sequence of HEVC BlowingBubbles.
  • HM-16.7 is used under low delay P configuration.
  • Only half-pel (yellow) positions are predicted using CNN as shown below:
  • where integer-pels (blue) are available at both encoder and decoder.

2. Experimental Results

2.1. BD-Rate

BD-Rate Against HEVC
  • The proposed CNN obtains 0.9% BD-rate reduction against HEVC while Zhang VCIP’17 can only obtain 0.4% BD-rate reduction.

3.2. RD Curves

RD Curves
  • The above RD curves show that the proposed CNN is more efficient at high bitrate condition than at low bitrate condition.

This is the 21st story in this month!



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.