Reading: DRNFRUC & DRNWCMC — Frame Rate Up-Conversion (H.264 Inter Prediction)
In this story, Deep Residual Network for the Frame Rate Up-Conversion (DRNFRUC), and Deep Residual Network with Weighted Convolutional Motion Compensation (DRNWCMC), by Tsinghua University, Hangzhou Dianzi University, and Tsinghua–UC Berkeley Shenzhen Institute, is briefly presented. In this paper:
- A raw sequence, after compressed by H.264, frame rate up conversion (FRUC) is performed by interpolating the video frame in between the decoded/reconstructed frames.
- The interpolated frames are enhanced by DRNFRUC & DRNWCMC.
This is a paper in 2020 TCSVT where TCSVT has a high impact factor of 4.046. (Sik-Ho Tsang @ Medium)
Outline
- Deep Residual Network for the Frame Rate Up-Conversion (DRNFRUC)
- Deep Residual Network with Weighted Convolutional Motion Compensation (DRNWCMC)
- Experimental Results
1. Deep Residual Network for the Frame Rate Up-Conversion (DRNFRUC)
- Actually, the FRUC can be any kind of FRUC.
- After FRUC, we obtain the interpolated frame. This interpolated frame goes through the deep residual network to enhance its quality.
- The deep residual network shown in the above figure consists of three parts.
- The first part, feature extraction, uses convolution filters of which the size is 3×3 to extract features of the image as feature maps. Then, we add batch normalization layer followed by ReLU, which acts as the activation function, in order to decrease training time.
- The second part, feature recursive analysis, widens the receptive field to analyze image feature extracted from larger image region with each recursion.
- The third part, image restoration, uses the output of the feature recursive analysis to obtain the interpolated frame. This part only uses 1 filter with 3×3 convolution kernel.
- Three residual block variants are tried as shown above.
- It is found that 2Cov+2BN+ReLU performs the best with fewer parameters.
- The number of layers and filters, c = 8, 16, 20 and n1 = 32, 64, 96, are tried.
- c = 16 and n1 = 64 are chosen.
2. Deep Residual Network with Weighted Convolutional Motion Compensation (DRNWCMC)
- The bilateral motion estimation is a prior art to estimate the motion.
- Both Forward and backward convolutional neural network, Df and Db, share the same filter weights.
- The first stage employs Df and Db to enhance pixel information Ivf and Ivb respectively.
- The second stage generates an interpolated frame IWCn . wf and wb, which are learned together with forward and backward convolutional neural networks, are viewed as a filter with 1×1 convolutional kernel, respectively.
- Number of filters and layers in WCMCI, d = 8, 16, 20 and n2 = 32, 64, 96, are tried.
- And it is concluded that there are not much impact.
3. Experimental Results
- H.264 reference software JM-18.16 is used.
- (Since the codec used is H.264 and the sequences used are quite old, I will only show few results)
- (Also, there is no BD-rate measurement. One of the reasons I think is that the frames interpolated by FRUC will not be used for other frames’ r)eferences.
- DS-ME is a prior art of FRUC.
- With DRNFRUC or DRNWCMC, higher PSNR is obtained. (where β is the weight decay hyperparameters.)
- Without GPU, about 3 seconds to 56 seconds are needed for DRNFRUC and DRNWCMC depending on the frame size.
- With GPU, it is much faster.
It is quite surprising that there is still transaction paper published using H.264!!
This is the 4th story in this month..
Reference
[2020 TCVST] [DRNFRUC & DRNWCMC]
Weighted Convolutional Motion-Compensated Frame Rate Up-Conversion Using Deep Residual Network
Codec Inter Prediction
H.264 [DRNFRUC & DRNWCMC]
HEVC [CNNIF] [Zhang VCIP’17] [NNIP] [Ibrahim ISM’18] [VI-CNN] [FRUC+DVRF][FRUC+DVRF+VECNN] [RSR] [Zhao ISCAS’18 & TCSVT’19] [Ma ISCAS’19] [ES] [CNN-SR & CNN-UniSR & CNN-BiSR] [DeepFrame] [U+DVPN]
VVC [FRUC+DVRF+VECNN]