Reading: RSVE+POST — CNN-Based Synthesized View Quality Enhancement (3D-HEVC Codec Filtering)

CNN for Reference Synthesized View Enhancement (RSVE) and Post Processing (POST), Outperforms ARCNN and VRCNN

4 min readMay 13, 2020

In this story, Convolutional Neural Network-Based Synthesized View Quality Enhancement for 3D Video Coding (RSVE+POST), by City University of Hong Kong, Chinese Academy of Sciences, and Shandong University, is briefly described. I read this because I work on video coding research.

**3D-HEVC (DIBR: Depth Image Based Rendering)**

3D-HEVC is one of the extensions of HEVC to support 3D video. With the color/texture videos and depth maps of both left and right views, any intermediate virtual views in between can be synthesized, such that autostereoscopic or free viewpoint 3D videos can be supported. Thus, it is called multiview video plus depth coding (MVD). And recently, the 3D video technology has been involved in the MPEG-I project to support 3DoF and 6DoF videos which will be completed in the coming years.

In this paper, a Convolutional Neural Network (CNN) is proposed for Reference Synthesized View Enhancement (RSVE) during optimization and Post Processing (POST) for 3D-HEVC. This is a paper in 2018 TIP where TIP has a high impact factor of 6.79. (Sik-Ho Tsang @ Medium)

Outline

Proposed Framework
Network Architecture
Experimental Results

1. Proposed Framework

First, a CNN is proposed for Reference Synthesized View Enhancement (RSVE) during encoding process.
Then, a modified Lagrange Multiplier is applied due to the proposed RSVE. (But I will not mention this here since I want to focus on CNN.)
Finally, before generating the virtual view, a CNN is used to enhance the virtual view as post-processing (POST) stage.

2. Network Architecture

In the first layer, there are three images of input. The left and right reference views are added as input to the CNN model for providing useful pixel information from inter-view domain.
The input I includes the distorted synthesized image V as well as the texture images of the left and right reference viewpoints, L and R, i.e., I = {L,V,R}.
Convolutional layer with 64 feature maps of output is used with filtering window size of 3×3. The outputs are non-linear mapped by activation function of Rectified Linear Unit (ReLU).
Batch normalization is used at 2nd and 3rd layers.
Residual learning is used.
This network is used for both Reference Synthesized View Enhancement (RSVE) and Post Processing (POST).

3. Experimental Results

3.1. Dataset

Since dataset is small, it is split into two sets: Set 1 and Set 2.
When Set 1 is used for training, Set 2 is used for testing, and vice versa.

3.2. Individual Performance

For RSVE, 16.31% and 11.04% BD-rate reduction are achieved.
For POST, 24.53% and 15.80% BD-rate reduction are achieved.

3.3. SOTA Comparison

Proposed approach outperforms ARCNN and VRCNN with higher PSNR.
It also outperforms a non-deep-learning-based approach. (But I don’t show it here.)

3.4. Visual Quality

The proposed CNN is much better than those processed by ARCNN and VRCNN. For example, there are some boundary artifacts below balloons in Figs. 8(c), (d) and (e), and to the left of the girl’s head in Figs. 9(c), (d) and (e).
By contrast, these artifacts are not apparent in Figs. 8(f) and 9(f).

3.5. Complexity

For RSVE, the computational complexity increases 294% and 369% on average for the module of encoder optimization under Sets 1 and 2, respectively.
For the post-processing (POST) with the proposed CNN model, the computational complexity increases significantly, i.e., 1711% and 2098% under Sets 1 and 2, respectively.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 18th story in this month. Thanks for visiting my story..

Reference

[2018 TIP] [RSVE+POST]
Convolutional Neural Network-Based Synthesized View Quality Enhancement for 3D Video Coding

Codec Filtering

JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [Liu PCS’19] [QE-CNN]
3D-HEVC [RSVE+POST]
VVC [Lu CVPRW’19] [Wang APSIPA ASC’19]