Reading: CNNMCR — Convolutional Neural Network-Based Motion Compensation Refinement (HEVC Inter Prediction)

Using VRCNN Network, Simple CNNMCR & CNNMCR Achieves On Average 1.8% And 2.3% BD-Rate Reduction

4 min readJun 13, 2020

In this story, Convolutional Neural Network-Based Motion Compensation Refinement (CNNMCR), by University of Science and Technology of China, is briefly presented since VRCNN is used but at different places within the codec. In this paper:

Simple CNNMVR: A CNN is used to refine the motion compensated prediction.
CNNMVR: Neighboring reconstructed region is also utilized.

This is a paper in 2018 ISCAS. (Sik-Ho Tsang @ Medium)

Outline

Simple CNNMVR
CNNMVR
Experimental Results

1. Simple CNNMVR

1.1. VRCNN

VRCNN is used as the CNN to refine the motion compensated prediction for both Simple CNNMVR and CNNMVR.
(If interested, please read my story about VRCNN.)

1.2. Simple CNNMCR

The CNNMCR is applied after the motion compensated prediction and before becoming the residue signal.
Specifically, we perform the traditional motion compensation first, and then apply the trained CNN to refine the prediction signal, and compress the residue accordingly.
Since the CNN is such trained that it approximates the original signal better, we believe the residue can be less and thus the compression efficiency is improved.
Standard loss function is used:

where Y is the motion compensated prediction signal and X is the original signal.

2. CNNMVR

**CNNMCR utilizes the neighboring reconstructed region to help the refinement**

In CNNMVR, the neighboring reconstructed region is utilized to help the refinement.
Considering the spatial correlation is higher for adjacent blocks than for distant blocks, a stride on the top and to the left of the current block is used, and take the reconstructed image in this stride as the input information to the CNN. In this paper, the stride width is 8 pixel.
Such two inputs are stitched to form a larger block and input into CNN.
A layer is added at the end of VRCNN, which crops the corresponding block.
The selection of CNNMVR is based on rate distortion (RD) optimization.

3. Experimental Results

3.1. Training

Two sequences are used for generating training data, i.e. BlowingBubbles and BQMall. These two sequences are compressed by HM at four different quantization parameters (QPs): 22, 27, 32, and 37, under low-delay P (LDP) configuration.
HM-12.0 is used.
Only CUs with 16×16 pixels in luma are used as training data.
For each QP, a different model is trained.
The trained CNNs can be applied to the other CU sizes (8×8, 32×32, 64×64) as well.

3.2. BD-Rate

The Simple CNNMCR scheme achieves up to 6.0% BD-rate reduction (BQSquare) and on average 1.8% BD-rate reduction in the luma component (Y).
The CNNMCR scheme is even better, achieves up to 6.7% BD-rate reduction (BQSquare) and on average 2.3% BDrate reduction in the luma component.
CNN-based refinement does not apply on the chroma components, so there is slight loss in the chroma components (U and V).
Performance of Class F is clearly lower since they are screen content videos with different characteristics.
OBMC is one of the most effective methods to refine motion compensation. OBMC technique [3] alone achieves 3.2% BD-rate reduction.
OBMC + CNNMCR gives out on average 5.2% BD-rate reduction than the HEVC anchor.