Reading: DRNFRUC & DRNWCMC — Frame Rate Up-Conversion (H.264 Inter Prediction)

In this story, Deep Residual Network for the Frame Rate Up-Conversion (DRNFRUC), and Deep Residual Network with Weighted Convolutional Motion Compensation (DRNWCMC), by Tsinghua University, Hangzhou Dianzi University, and Tsinghua–UC Berkeley Shenzhen Institute, is briefly presented. In this paper:

  • A raw sequence, after compressed by H.264, frame rate up conversion (FRUC) is performed by interpolating the video frame in between the decoded/reconstructed frames.
  • The interpolated frames are enhanced by DRNFRUC & DRNWCMC.

This is a paper in 2020 TCSVT where TCSVT has a high impact factor of 4.046. ( @ Medium)

Outline

  1. Deep Residual Network for the Frame Rate Up-Conversion (DRNFRUC)
  2. Deep Residual Network with Weighted Convolutional Motion Compensation (DRNWCMC)
  3. Experimental Results

1. Deep Residual Network for the Frame Rate Up-Conversion (DRNFRUC)

DRNFRUC
  • Actually, the FRUC can be any kind of FRUC.
  • After FRUC, we obtain the interpolated frame. This interpolated frame goes through the deep residual network to enhance its quality.
The Deep Residual Network in DRNFRUC
  • The deep residual network shown in the above figure consists of three parts.
  • The first part, feature extraction, uses convolution filters of which the size is 3×3 to extract features of the image as feature maps. Then, we add batch normalization layer followed by ReLU, which acts as the activation function, in order to decrease training time.
  • The second part, feature recursive analysis, widens the receptive field to analyze image feature extracted from larger image region with each recursion.
  • The third part, image restoration, uses the output of the feature recursive analysis to obtain the interpolated frame. This part only uses 1 filter with 3×3 convolution kernel.
(a) Cov+BN (b) 2Cov+2BN+ReLU (c) 3Cov+3BN+2ReLU.
PSNR Obtained by the three residual block variants
  • Three residual block variants are tried as shown above.
  • It is found that 2Cov+2BN+ReLU performs the best with fewer parameters.
PSNR obtained by different number of layers and filters
  • The number of layers and filters, c = 8, 16, 20 and n1 = 32, 64, 96, are tried.
  • c = 16 and n1 = 64 are chosen.

2. Deep Residual Network with Weighted Convolutional Motion Compensation (DRNWCMC)

DRNWCMC
Forward and backward convolutional neural network in DRNWCMC
  • The bilateral motion estimation is a prior art to estimate the motion.
  • Both Forward and backward convolutional neural network, Df and Db, share the same filter weights.
  • The first stage employs Df and Db to enhance pixel information Ivf and Ivb respectively.
  • The second stage generates an interpolated frame IWCn . wf and wb, which are learned together with forward and backward convolutional neural networks, are viewed as a filter with 1×1 convolutional kernel, respectively.
PSNR obtained by different number of layers and filters
  • Number of filters and layers in WCMCI, d = 8, 16, 20 and n2 = 32, 64, 96, are tried.
  • And it is concluded that there are not much impact.

3. Experimental Results

  • H.264 reference software JM-18.16 is used.
  • (Since the codec used is H.264 and the sequences used are quite old, I will only show few results)
  • (Also, there is no BD-rate measurement. One of the reasons I think is that the frames interpolated by FRUC will not be used for other frames’ r)eferences.
PSNR against Frame Number
  • DS-ME is a prior art of FRUC.
  • With DRNFRUC or DRNWCMC, higher PSNR is obtained. (where β is the weight decay hyperparameters.)
Average Running Time Per Frame
  • Without GPU, about 3 seconds to 56 seconds are needed for DRNFRUC and DRNWCMC depending on the frame size.
  • With GPU, it is much faster.

It is quite surprising that there is still transaction paper published using H.264!!

This is the 4th story in this month..

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG