Review: Liu PCS’19 — Dual Learning-based Video Coding with Inception Dense Blocks (HEVC Codec Filtering)
Inception Block, Originated in Inception-v3 is Used, Outperforms ARCNN, VRCNN, and RHCNN.
In this story, Dual Learning-based Video Coding with Inception Dense Blocks (Liu PCS’19), by Fudan University and Waseda University, is reviewed. I read this because I work on video coding research. In this paper, two networks are used. One is for intra prediction using fully connected network (FCN). Another one is for in-loop filtering using convolutional neural network (CNN).
- For the FCN, the approach in IPFCN is used. So, it is not described here. (For interested, please read Sections 1 & 2 in IPCNN about the importance of video coding and the conventional HEVC intra coding.)
- For the CNN based filtering, a deeper network using Inception block is proposed.
This is an approach attending the “Grand Challenge on Short Video Coding” in 2019 PCS. And it is published in 2019 PCS. (Sik-Ho Tsang @ Medium)
Outline
- Convolutional Network-based Reconstruction Filtering
- Experimental Results
1. Convolutional Network-based Reconstruction Filtering
- Pre-Processing (Green): Two convolutional layers of 64 feature maps with kernel size is 3×3, which helps to improve the extraction of basic features.
- Inception Block (Orange): 12 Inception network blocks, originated in Inception-v3, with modification, are used to further extract the features. It composed of three branches as shown at the right of the figure above.
- Each branch has a 32 feature maps convolutional layers with kernel size is 1×1 as the first layer.
- In order to extract the features from different receptive fields, the two convolution layers with kernel size are 1×3 and 3×1 are connected to the first layer in one of the branches.
- In addition, another branch is connected to a convolution layer with a kernel size of 3×3 serially, and then connected with two convolution layers of 1×3 and 3×1 convolution kernels in parallel.
- Different with the original Inception-v3, the pooling layer is removed.
- Post-Processing (Purple): A convolutional layer with a kernel size of 3×3 is used and only one feature maps is outputted.
- ReLU is used for all convolutional layers except the last layer.
- The input to the designed network is 32×32 reconstructed block from HM.
- It is block-level filtering approach. For YUV420, a CTU with the size 64×64 is divided into four 32×32 luminance component blocks and two 32×32 chroma component blocks.
- MSE is used as the loss function:
- (The paper title mentioning Dense Block, but there is no Dense Block from DenseNet.)
2. Experimental Results
2.1. Training
- The training dataset is DIV2K, which consisted of 900 images with a resolution of 2K.
- Four models are trained with different Quantization Parameter (QP) bands.
- HM-16.20 is used.
2.2. PCS Grand Challenge Short Videos
- AI: All Intra, all frames are coded as intra frame
- RA: Random Access, frames are coded with hierarchical-B structure.
- AI: BD-rate saving of at most 12.83% and on average 10.24% are obtained for the luminance component. BD-rate saving of 12.41% and 14.24% for the chrominance components are obtained respectively.
- RA: The YUV components obtain BD-rate saving on average of 3.57% 5.38% and 4.61%, and the luminance component obtains at most 7.09% BD-rate saving with sequence 13.
- Left: In the face area of the blue box as shown above, we can clearly see the contouring and blocking artifacts.
- Right: On the other hand, these artifacts are well eliminated and the face is smoother and plumper.
- Moreover, the proposed model provides a higher compression ratio (0.231 bpp of the proposed models to 0.243 bpp of HM). (bpp: bits per pixel)
- The proposed approach only using the proposed filter already outperforms ARCNN and VRCNN.
- With also the IPFCN-based intra prediction, even larger margin is obtained.
2.3. HEVC Testing Sequences
- The filtering model also get results (9.70%, 11.59% and 13.35% respectively on the three components of YUV.
- The filter models saving 7.77% BD-rate on average and up to 11.61% BD-rate saving is obtained by the joint model.
- And the trainable parameters number of the proposed filter model is 475,233. At the same time, The RHCNN with 3,340,000 trainable parameters.
During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 9th story in this month. Thanks for visiting my story..
Reference
[2019 PCS] [Liu PCS’19]
Dual Learning-based Video Coding with Inception Dense Blocks
Codec Filtering
JPEG: [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC:[Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [Liu PCS’19]
VVC: [Lu CVPRW’19] [Wang APSIPA ASC’19]
Codec Intra Prediction
HEVC [CNNIF] [Xu VCIP’17] [Song VCIP’17] [IPCNN] [IPFCN] [CNNAC] [Li TCSVT’18] [AP-CNN] [MIP] [Wang VCIP’19]
VVC [Brand PCS’19]