Reading: VI-CNN — Video Interpolation CNN (HEVC Inter Prediction)
Using SepConv, 1.4% BD-Rate Reduction Compared With Conventional HEVC
--
In this story, Video Interpolation CNN (VI-CNN), by Ewha Womans University, and Electronics and Telecommunications Research Institute (ETRI), is briefly described. I read this because I work on video coding research.
The approach in this paper is quite similar to FRUC+DVRF and ES that I have just read today morning and afternoon respectively. (Thus, I read them as a series in one day.. lol.) In this paper:
- First, a virtual reference (VR) frame is generated by SepConv.
- Then, this VR frame is put into reference picture list such that no additional bits are required to signalled while FRUC+DVRF and ES need. Consequently, bitstream syntax should be the same as the conventional HEVC one.
This is a paper in 2018 APSIPA ASC. (Sik-Ho Tsang @ Medium)
Outline
- Hierarchical B Structure in HEVC
- Proposed Approach: VI-CNN
- Experimental Results
1. Hierarchical B Structure in HEVC
- When Hierarchical B Structure is used in HEVC, i.e. Random Access Configuration. The coding order is based on the temporal layer (TL). The frames with lower TL are coded first.
- In this case, I0 and B8 are coded first (TL=0, Red), then B4 (TL=1, Blue), then B2 and B6 (TL=2, Green), and finally B1, B3, B5 and B7 (TL=3, Yellow).
- With this arrangement, frames with lower TL can be acted as reference frames for frames with higher TL so as to have efficient compression.
2. Proposed Approach: VI-CNN
- First, VI-CNN utilizes SepConv to generate a VR frame that has the same time instant with the current frame.
- Then, instead of acting as an extra reference frame, this VR frame is put into the forward and backward reference picture lists.
- In this way, no modification to the bitstream syntax, no extra signalling bits.
- It is put into the index 1 of both lists for some specific frames to replace the original reference frame, as shown above.
- Specifically, only the frames B1, B3, B5 and B7 (TL=3, Yellow) utilize the proposed approach since they are non-reference frames, i.e. they will not be used for reference for other frames. In my opinion, this can help to reduce the error propagation.
3. Experimental Results
- HM-16.9 is used with the RA configuration (i.e. Hierarchical B structure).
- 1.4% BD-rate reduction is obtained for luma.
- RD curve of VI-CNN is at top of the conventional HEVC one meaning that it is better.
- As shown above, the residues obtained by VI-CNN in (d) are much smaller than in (e) and (f).
- Smaller residues mean they are much close to the current one. And consequently can obtain smaller coding bitrate.
- However, it seems that the coding gain is not as much as FRUC+DVRF and ES. Maybe they no need to encode the motion information such as merge flag, merge index, AMVP index and motion vector difference. In addition, ES provides frame-level usage to replace the whole current frame.
During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 14th story in this month. Thanks for visiting my story..
Reference
[2018 APSIPA ASC] [VI-CNN]
Convolution Neural Network based Video Coding Technique using Reference Video Synthesis
Codec Inter Prediction
HEVC [Zhang VCIP’17] [NNIP] [Ibrahim ISM’18] [FRUC+DVRF] [VI-CNN] [Zhao ISCAS’18 & TCSVT’19] [Ma ISCAS’19] [ES]