Reading: DRNFRUC & DRNWCMC — Frame Rate Up-Conversion (H.264 Inter Prediction)

4 min readJun 4, 2020

In this story, Deep Residual Network for the Frame Rate Up-Conversion (DRNFRUC), and Deep Residual Network with Weighted Convolutional Motion Compensation (DRNWCMC), by Tsinghua University, Hangzhou Dianzi University, and Tsinghua–UC Berkeley Shenzhen Institute, is briefly presented. In this paper:

A raw sequence, after compressed by H.264, frame rate up conversion (FRUC) is performed by interpolating the video frame in between the decoded/reconstructed frames.
The interpolated frames are enhanced by DRNFRUC & DRNWCMC.

This is a paper in 2020 TCSVT where TCSVT has a high impact factor of 4.046. (Sik-Ho Tsang @ Medium)

Outline

Deep Residual Network for the Frame Rate Up-Conversion (DRNFRUC)
Deep Residual Network with Weighted Convolutional Motion Compensation (DRNWCMC)
Experimental Results

1. Deep Residual Network for the Frame Rate Up-Conversion (DRNFRUC)

Actually, the FRUC can be any kind of FRUC.
After FRUC, we obtain the interpolated frame. This interpolated frame goes through the deep residual network to enhance its quality.

**The Deep Residual Network in DRNFRUC**

The deep residual network shown in the above figure consists of three parts.
The first part, feature extraction, uses convolution filters of which the size is 3×3 to extract features of the image as feature maps. Then, we add batch normalization layer followed by ReLU, which acts as the activation function, in order to decrease training time.
The second part, feature recursive analysis, widens the receptive field to analyze image feature extracted from larger image region with each recursion.
The third part, image restoration, uses the output of the feature recursive analysis to obtain the interpolated frame. This part only uses 1 filter with 3×3 convolution kernel.

**(a) Cov+BN (b) 2Cov+2BN+ReLU (c) 3Cov+3BN+2ReLU.**

**PSNR Obtained by the three residual block variants**

Three residual block variants are tried as shown above.
It is found that 2Cov+2BN+ReLU performs the best with fewer parameters.

**PSNR obtained by different number of layers and filters**

The number of layers and filters, c = 8, 16, 20 and n1 = 32, 64, 96, are tried.
c = 16 and n1 = 64 are chosen.

2. Deep Residual Network with Weighted Convolutional Motion Compensation (DRNWCMC)

**Forward and backward convolutional neural network in DRNWCMC**

The bilateral motion estimation is a prior art to estimate the motion.
Both Forward and backward convolutional neural network, Df and Db, share the same filter weights.
The first stage employs Df and Db to enhance pixel information Ivf and Ivb respectively.
The second stage generates an interpolated frame IWCn . wf and wb, which are learned together with forward and backward convolutional neural networks, are viewed as a filter with 1×1 convolutional kernel, respectively.

Number of filters and layers in WCMCI, d = 8, 16, 20 and n2 = 32, 64, 96, are tried.
And it is concluded that there are not much impact.

3. Experimental Results

H.264 reference software JM-18.16 is used.
(Since the codec used is H.264 and the sequences used are quite old, I will only show few results)
(Also, there is no BD-rate measurement. One of the reasons I think is that the frames interpolated by FRUC will not be used for other frames’ r)eferences.

DS-ME is a prior art of FRUC.
With DRNFRUC or DRNWCMC, higher PSNR is obtained. (where β is the weight decay hyperparameters.)

Without GPU, about 3 seconds to 56 seconds are needed for DRNFRUC and DRNWCMC depending on the frame size.
With GPU, it is much faster.

It is quite surprising that there is still transaction paper published using H.264!!
This is the 4th story in this month..

Reference

[2020 TCVST] [DRNFRUC & DRNWCMC]
Weighted Convolutional Motion-Compensated Frame Rate Up-Conversion Using Deep Residual Network

Codec Inter Prediction

H.264 [DRNFRUC & DRNWCMC]
HEVC [CNNIF] [Zhang VCIP’17] [NNIP] [Ibrahim ISM’18] [VI-CNN] [FRUC+DVRF][FRUC+DVRF+VECNN] [RSR] [Zhao ISCAS’18 & TCSVT’19] [Ma ISCAS’19] [ES] [CNN-SR & CNN-UniSR & CNN-BiSR] [DeepFrame] [U+DVPN]
VVC [FRUC+DVRF+VECNN]