Reading: B-DRRN — Deep Recursive Residual Network with Block Information (Codec Filtering)

Using DRRN Design With Double-Input, Outperforms ARCNN, VRCNN, DCAD, DRRN, Much Fewer Parameters Than Double-Input CNN

4 min readMay 17, 2020

In this story, Deep Recursive Residual Network with Block Information (B-DRRN), by Hosei University, is briefly described. I read this because I work on video coding research.

B-DRRN acts as a post-processing filter (not an in-loop filter) to output an enhanced quality frame, as shown above. With the use of recursive units, it has fewer parameters than Double-Input CNN that I have just read few hours before. This is a paper in 2019 PCS. (Sik-Ho Tsang @ Medium)

Outline

Mean Mask Frame Input
B-DRRN Network Architecture
Experimental Results

1. Mean Mask Frame Input

This Mean mask based method is originated from Double-Input CNN.
This Mean mask frame is conducted by calculating the mean values of all pixels inside a square which starts from the beginning position to the end position of a block. Next, this mean-value is assigned back to those pixels inside of that square. The process then repeats over all blocks of the decoded frame.
However, an extra branch in Double-Input CNN makes the model large.
Thus, in this paper, the network makes use of DRRN to solve this problem, i.e. to make the model lighter.

2. B-DRRN Network Architecture

The first convolution layer (white color) contains 64 filters with a size of 3×3. And the Recurrent Residual Unit has two convolution layers; each also has 64 3×3 filters.
At the main branch, the Recurrent Residual unit repeats nine times as the DRRN design.
To keep the balance between two branches, the extra branch will have three iterations.

The convolution that involves iterations are having the parameter sharing.

The outputs of these two branches are combined by adding operator or concatenating operator. For concatenating fusion, the output dimension of the fusion layer will be double, so we need an additional convolution layer goes after to reduce the dimension of the feature from 128 to 64.
Then, the merge branch is applied with a 3×3 filter and residual reconstruction.
Activation function is applied before the convolution layer.
MSE is used as loss function.

3. Experimental Results

The training dataset includes 600 sequences with varied video resolutions: qcif, cif, 360p, 480p, HD, full HD, and ultra HD. (But authors did not mention which dataset they use.)
HM-20.0, with four QPs = 22, 27, 32 and 37, is used.

**BD-Rate (%) Against the Conventional HEVC**

B-DRRN with concatenation and adding approaches can get 6.24% and 6.16% BD-rate reduction respectively.
This result outperforms ARCNN, VRCNN, DCAD, and DRRN ones.

Though B-DRRN obtains less BD-rate reduction compared with Double-Input CNN, B-DRRN has a much lighter model size.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 25th story in this month. 5 stories to go. Thanks for visiting my story..

Reference

[2019 PCS] [B-DRRN]
B-DRRN: A Block Information Constrained Deep Recursive Residual Network for Video Compression Artifacts Reduction

Codec Filtering

JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [Double-Input CNN] [B-DRRN] [Liu PCS’19] [QE-CNN] [EDCNN]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19]
VVC [Lu CVPRW’19] [Wang APSIPA ASC’19]