Review: Liu PCS’19 — Dual Learning-based Video Coding with Inception Dense Blocks (HEVC Codec Filtering)

Inception Block, Originated in Inception-v3 is Used, Outperforms ARCNN, VRCNN, and RHCNN.

Sik-Ho Tsang
5 min readMay 8, 2020

In this story, Dual Learning-based Video Coding with Inception Dense Blocks (Liu PCS’19), by Fudan University and Waseda University, is reviewed. I read this because I work on video coding research. In this paper, two networks are used. One is for intra prediction using fully connected network (FCN). Another one is for in-loop filtering using convolutional neural network (CNN).

  • For the FCN, the approach in IPFCN is used. So, it is not described here. (For interested, please read Sections 1 & 2 in IPCNN about the importance of video coding and the conventional HEVC intra coding.)
  • For the CNN based filtering, a deeper network using Inception block is proposed.

This is an approach attending the “Grand Challenge on Short Video Coding” in 2019 PCS. And it is published in 2019 PCS. (Sik-Ho Tsang @ Medium)

Outline

  1. Convolutional Network-based Reconstruction Filtering
  2. Experimental Results

1. Convolutional Network-based Reconstruction Filtering

Left: Network Architecture, Right: Inception Block
  • Pre-Processing (Green): Two convolutional layers of 64 feature maps with kernel size is 3×3, which helps to improve the extraction of basic features.
  • Inception Block (Orange): 12 Inception network blocks, originated in Inception-v3, with modification, are used to further extract the features. It composed of three branches as shown at the right of the figure above.
  • Each branch has a 32 feature maps convolutional layers with kernel size is 1×1 as the first layer.
  • In order to extract the features from different receptive fields, the two convolution layers with kernel size are 1×3 and 3×1 are connected to the first layer in one of the branches.
  • In addition, another branch is connected to a convolution layer with a kernel size of 3×3 serially, and then connected with two convolution layers of 1×3 and 3×1 convolution kernels in parallel.
  • Different with the original Inception-v3, the pooling layer is removed.
  • Post-Processing (Purple): A convolutional layer with a kernel size of 3×3 is used and only one feature maps is outputted.
  • ReLU is used for all convolutional layers except the last layer.
  • The input to the designed network is 32×32 reconstructed block from HM.
  • It is block-level filtering approach. For YUV420, a CTU with the size 64×64 is divided into four 32×32 luminance component blocks and two 32×32 chroma component blocks.
  • MSE is used as the loss function:
  • (The paper title mentioning Dense Block, but there is no Dense Block from DenseNet.)

2. Experimental Results

2.1. Training

  • The training dataset is DIV2K, which consisted of 900 images with a resolution of 2K.
  • Four models are trained with different Quantization Parameter (QP) bands.
  • HM-16.20 is used.

2.2. PCS Grand Challenge Short Videos

BD-Rate (%) on PCS Grand Challenge Short Videos
  • AI: All Intra, all frames are coded as intra frame
  • RA: Random Access, frames are coded with hierarchical-B structure.
  • AI: BD-rate saving of at most 12.83% and on average 10.24% are obtained for the luminance component. BD-rate saving of 12.41% and 14.24% for the chrominance components are obtained respectively.
  • RA: The YUV components obtain BD-rate saving on average of 3.57% 5.38% and 4.61%, and the luminance component obtains at most 7.09% BD-rate saving with sequence 13.
Visual Quality
  • Left: In the face area of the blue box as shown above, we can clearly see the contouring and blocking artifacts.
  • Right: On the other hand, these artifacts are well eliminated and the face is smoother and plumper.
  • Moreover, the proposed model provides a higher compression ratio (0.231 bpp of the proposed models to 0.243 bpp of HM). (bpp: bits per pixel)
SOTA Comparison in AI Configuration on Short Sequences
  • The proposed approach only using the proposed filter already outperforms ARCNN and VRCNN.
  • With also the IPFCN-based intra prediction, even larger margin is obtained.

2.3. HEVC Testing Sequences

BD-rate (%) in AI Configuration on HEVC Test Sequences
  • The filtering model also get results (9.70%, 11.59% and 13.35% respectively on the three components of YUV.
Comparison with RHCNN
  • The filter models saving 7.77% BD-rate on average and up to 11.61% BD-rate saving is obtained by the joint model.
Trainable Parameters Number
  • And the trainable parameters number of the proposed filter model is 475,233. At the same time, The RHCNN with 3,340,000 trainable parameters.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 9th story in this month. Thanks for visiting my story..

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet