Review: Liu PCS’19 — Dual Learning-based Video Coding with Inception Dense Blocks (HEVC Codec Filtering)

Inception Block, Originated in Inception-v3 is Used, Outperforms ARCNN, VRCNN, and RHCNN.

5 min readMay 8, 2020

In this story, Dual Learning-based Video Coding with Inception Dense Blocks (Liu PCS’19), by Fudan University and Waseda University, is reviewed. I read this because I work on video coding research. In this paper, two networks are used. One is for intra prediction using fully connected network (FCN). Another one is for in-loop filtering using convolutional neural network (CNN).

For the FCN, the approach in IPFCN is used. So, it is not described here. (For interested, please read Sections 1 & 2 in IPCNN about the importance of video coding and the conventional HEVC intra coding.)
For the CNN based filtering, a deeper network using Inception block is proposed.

This is an approach attending the “Grand Challenge on Short Video Coding” in 2019 PCS. And it is published in 2019 PCS. (Sik-Ho Tsang @ Medium)

Outline

Convolutional Network-based Reconstruction Filtering
Experimental Results

1. Convolutional Network-based Reconstruction Filtering

**Left: Network Architecture, Right:** **Inception** **Block**

Pre-Processing (Green): Two convolutional layers of 64 feature maps with kernel size is 3×3, which helps to improve the extraction of basic features.
Inception Block (Orange): 12 Inception network blocks, originated in Inception-v3, with modification, are used to further extract the features. It composed of three branches as shown at the right of the figure above.
Each branch has a 32 feature maps convolutional layers with kernel size is 1×1 as the first layer.
In order to extract the features from different receptive fields, the two convolution layers with kernel size are 1×3 and 3×1 are connected to the first layer in one of the branches.
In addition, another branch is connected to a convolution layer with a kernel size of 3×3 serially, and then connected with two convolution layers of 1×3 and 3×1 convolution kernels in parallel.
Different with the original Inception-v3, the pooling layer is removed.
Post-Processing (Purple): A convolutional layer with a kernel size of 3×3 is used and only one feature maps is outputted.
ReLU is used for all convolutional layers except the last layer.
The input to the designed network is 32×32 reconstructed block from HM.
It is block-level filtering approach. For YUV420, a CTU with the size 64×64 is divided into four 32×32 luminance component blocks and two 32×32 chroma component blocks.
MSE is used as the loss function:

(The paper title mentioning Dense Block, but there is no Dense Block from DenseNet.)

2. Experimental Results

2.1. Training

The training dataset is DIV2K, which consisted of 900 images with a resolution of 2K.
Four models are trained with different Quantization Parameter (QP) bands.
HM-16.20 is used.

2.2. PCS Grand Challenge Short Videos

**BD-Rate (%) on PCS Grand Challenge Short Videos**

AI: All Intra, all frames are coded as intra frame
RA: Random Access, frames are coded with hierarchical-B structure.
AI: BD-rate saving of at most 12.83% and on average 10.24% are obtained for the luminance component. BD-rate saving of 12.41% and 14.24% for the chrominance components are obtained respectively.
RA: The YUV components obtain BD-rate saving on average of 3.57% 5.38% and 4.61%, and the luminance component obtains at most 7.09% BD-rate saving with sequence 13.

Left: In the face area of the blue box as shown above, we can clearly see the contouring and blocking artifacts.
Right: On the other hand, these artifacts are well eliminated and the face is smoother and plumper.
Moreover, the proposed model provides a higher compression ratio (0.231 bpp of the proposed models to 0.243 bpp of HM). (bpp: bits per pixel)

**SOTA Comparison in AI Configuration on Short Sequences**

The proposed approach only using the proposed filter already outperforms ARCNN and VRCNN.
With also the IPFCN-based intra prediction, even larger margin is obtained.

2.3. HEVC Testing Sequences

**BD-rate (%) in AI Configuration on HEVC Test Sequences**

The filtering model also get results (9.70%, 11.59% and 13.35% respectively on the three components of YUV.

The filter models saving 7.77% BD-rate on average and up to 11.61% BD-rate saving is obtained by the joint model.

Trainable Parameters Number

And the trainable parameters number of the proposed filter model is 475,233. At the same time, The RHCNN with 3,340,000 trainable parameters.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 9th story in this month. Thanks for visiting my story..

Reference

[2019 PCS] [Liu PCS’19]
Dual Learning-based Video Coding with Inception Dense Blocks

Codec Filtering

JPEG: [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC:[Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [Liu PCS’19]
VVC: [Lu CVPRW’19] [Wang APSIPA ASC’19]

Codec Intra Prediction

HEVC [CNNIF] [Xu VCIP’17] [Song VCIP’17] [IPCNN] [IPFCN] [CNNAC] [Li TCSVT’18] [AP-CNN] [MIP] [Wang VCIP’19]
VVC [Brand PCS’19]