Reading: VI-CNN — Video Interpolation CNN (HEVC Inter Prediction)

Using SepConv, 1.4% BD-Rate Reduction Compared With Conventional HEVC

Sik-Ho Tsang
4 min readMay 10, 2020


In this story, Video Interpolation CNN (VI-CNN), by Ewha Womans University, and Electronics and Telecommunications Research Institute (ETRI), is briefly described. I read this because I work on video coding research.

The approach in this paper is quite similar to FRUC+DVRF and ES that I have just read today morning and afternoon respectively. (Thus, I read them as a series in one day.. lol.) In this paper:

  • First, a virtual reference (VR) frame is generated by SepConv.
  • Then, this VR frame is put into reference picture list such that no additional bits are required to signalled while FRUC+DVRF and ES need. Consequently, bitstream syntax should be the same as the conventional HEVC one.

This is a paper in 2018 APSIPA ASC. (Sik-Ho Tsang @ Medium)


  1. Hierarchical B Structure in HEVC
  2. Proposed Approach: VI-CNN
  3. Experimental Results

1. Hierarchical B Structure in HEVC

Hierarchical B Structure in HEVC
  • When Hierarchical B Structure is used in HEVC, i.e. Random Access Configuration. The coding order is based on the temporal layer (TL). The frames with lower TL are coded first.
  • In this case, I0 and B8 are coded first (TL=0, Red), then B4 (TL=1, Blue), then B2 and B6 (TL=2, Green), and finally B1, B3, B5 and B7 (TL=3, Yellow).
  • With this arrangement, frames with lower TL can be acted as reference frames for frames with higher TL so as to have efficient compression.

2. Proposed Approach: VI-CNN

Virtual Reference (VR) Frame Generation
  • First, VI-CNN utilizes SepConv to generate a VR frame that has the same time instant with the current frame.
  • Then, instead of acting as an extra reference frame, this VR frame is put into the forward and backward reference picture lists.
  • In this way, no modification to the bitstream syntax, no extra signalling bits.
Details of Virtual Reference (VR) Frames Putting into Reference Picture List
  • It is put into the index 1 of both lists for some specific frames to replace the original reference frame, as shown above.
  • Specifically, only the frames B1, B3, B5 and B7 (TL=3, Yellow) utilize the proposed approach since they are non-reference frames, i.e. they will not be used for reference for other frames. In my opinion, this can help to reduce the error propagation.

3. Experimental Results

BD-Rate (%) Compared to Conventional HEVC
  • HM-16.9 is used with the RA configuration (i.e. Hierarchical B structure).
  • 1.4% BD-rate reduction is obtained for luma.
RD Curves
  • RD curve of VI-CNN is at top of the conventional HEVC one meaning that it is better.
Visual quality comparisons of the residues between the original frame and the various reference frames
  • As shown above, the residues obtained by VI-CNN in (d) are much smaller than in (e) and (f).
  • Smaller residues mean they are much close to the current one. And consequently can obtain smaller coding bitrate.
  • However, it seems that the coding gain is not as much as FRUC+DVRF and ES. Maybe they no need to encode the motion information such as merge flag, merge index, AMVP index and motion vector difference. In addition, ES provides frame-level usage to replace the whole current frame.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 14th story in this month. Thanks for visiting my story..



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.