Reading: Fischer QoMEX’20 — Coding Chain with Spatial Up and Down-Scaling (VVC Inter)

Using VDSR or RDN, 12% to 18% BD-Rate Reduction Using VMAF Compared to VVC.

3 min readSep 20, 2020

**Top branch: coding chain with spatial up- and downscaling; Bottom branch: conventional coding**

In this paper, On Versatile Video Coding at UHD with Machine-Learning-Based Super-Resolution (Fischer QoMEX’20), by Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg (FAU), is shortly presented. I read this because I work on video coding research. In this paper:

The input frame is firstly downsampled, then encoded.
This encoded frame is then decoded, and upsampled by super resolution network, as shown at the top branch of the figure above.

This is a paper in 2020 QoMEX. (Sik-Ho Tsang @ Medium)

Outline

VDSR
RDN
Experimental Results

1. VDSR

In this paper, authors try to use VDSR for the upsampling part.
Different from original VDSR, this one uses DIV2K dataset for training.
(If interested, please feel free to read VDSR.)

2. RDN

**RDB structure to extract local features**

Another network authors try to use is RDN.
We can just treat it as a more powerful network for upsampling comparing with VDSR.
(If interested, please feel free to read RDN.)

3. Experimental Results

3.1. BD-Rate

**BD-Rate Reduction (%) for two QP ranges using conventional VTM coding chain as anchor**

VTM-5.0 is used with Random Access Configuration.
L-SEABI is the Gaussian filter upsampling approach.
Considering a very low video quality (QPconv ={42, 44, 46, 48}), coding with the investigated coding chain with spatial downscaling results in BD-Rate reduction with respect to PSNR above 9 %.
At best, the investigated coding chain with RDN can save 39.5 % for the FoodMarket4 sequence.

3.2. Time Complexity

VDSR takes around 1 second to upscale the Y channel of a full HD frame to 4K resolution on a NVIDIA GeForce RTX 2080 Ti.
On the same unit, RDN takes between 6 and 8 seconds.
The L-SEABI takes around 1 second on an Intel Xeon E3–1275 v6 @ 3.8 GHz in the proposed coding chain for upscaling the Y-channel.