Reading: Fischer QoMEX’20 — Coding Chain with Spatial Up and Down-Scaling (VVC Inter)

Using VDSR or RDN, 12% to 18% BD-Rate Reduction Using VMAF Compared to VVC.

Sik-Ho Tsang
3 min readSep 20, 2020
Top branch: coding chain with spatial up- and downscaling; Bottom branch: conventional coding

In this paper, On Versatile Video Coding at UHD with Machine-Learning-Based Super-Resolution (Fischer QoMEX’20), by Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg (FAU), is shortly presented. I read this because I work on video coding research. In this paper:

  • The input frame is firstly downsampled, then encoded.
  • This encoded frame is then decoded, and upsampled by super resolution network, as shown at the top branch of the figure above.

This is a paper in 2020 QoMEX. (Sik-Ho Tsang @ Medium)

Outline

  1. VDSR
  2. RDN
  3. Experimental Results

1. VDSR

VDSR architecture
  • In this paper, authors try to use VDSR for the upsampling part.
  • Different from original VDSR, this one uses DIV2K dataset for training.
  • (If interested, please feel free to read VDSR.)

2. RDN

RDN architecture
RDB structure to extract local features
  • Another network authors try to use is RDN.
  • We can just treat it as a more powerful network for upsampling comparing with VDSR.
  • (If interested, please feel free to read RDN.)

3. Experimental Results

3.1. BD-Rate

BD-Rate Reduction (%) for two QP ranges using conventional VTM coding chain as anchor
  • VTM-5.0 is used with Random Access Configuration.
  • L-SEABI is the Gaussian filter upsampling approach.
  • Considering a very low video quality (QPconv ={42, 44, 46, 48}), coding with the investigated coding chain with spatial downscaling results in BD-Rate reduction with respect to PSNR above 9 %.
  • At best, the investigated coding chain with RDN can save 39.5 % for the FoodMarket4 sequence.

3.2. Time Complexity

  • VDSR takes around 1 second to upscale the Y channel of a full HD frame to 4K resolution on a NVIDIA GeForce RTX 2080 Ti.
  • On the same unit, RDN takes between 6 and 8 seconds.
  • The L-SEABI takes around 1 second on an Intel Xeon E3–1275 v6 @ 3.8 GHz in the proposed coding chain for upscaling the Y-channel.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.