Reading: RRCNN — Recursive Residual Convolutional Neural Network (Coded Filtering)

Outperforms VRCNN, RHCNN & CNNF. 8.7% Average BD-Rate Reduction for Luma. More Than 20% Average BD-Rate Reductions for Chroma.

6 min readJun 20, 2020

--

In this story, Recursive Residual Convolutional Neural Network (RRCNN), by Tianjin University, and Santa Clara University, is presented. I read this paper because I work on video coding research. In this paper:

Residual: Shortcut connections are used to skip a few stacked layers in CNN.
Recursive: The same set of weights are used recursively thus fewer number of parameters.
A single model is trained for various bitrate settings.

This is an early access article in 2019 TCSVT where TCSVT has the high impact factor of 4.046. It will be officially published in the future. (Sik-Ho Tsang @ Medium)

Outline

Residual Learning & Recursive Learning
RRCNN: Multi-QP Network Architecture
HEVC Implementation
Experimental Results

1. Residual Learning & Recursive Learning

**(a)** **VDSR** **(b) ResNet (c) RecNet (d) RRCNN**

For RRCNN in (d), a skip-connection is added in the entire network from the beginning to the end, termed External Residual Learning (ERL).
Also, an identity skip connection is attached in a few stacked layers, termed Internal Residual Learning (IRL).
Within the green dashed-line boxes, the convolutional weights in yellow & green are sharing the same weights respectively. This can help to keep the model parameters from increasing.

As shown above, Pre-Activation ResNet is used where BN and ReLU are performed before convolution.

**Average BD-PSNR (dB) & Number of Parameters**

Different networks as shown in the first figure with different network depths are tested.
Better performance can be achieved when the network goes deeper because a deep network guarantees strong learning ability.
Furthermore, RRCNN achieves the best performance at all depths and outperforms the second-best VDSR by 0.05dB (BD-PSNR) at a depth of 22 while utilizing 10 times fewer parameters, which verifies the effectiveness of the multi-path structure and recursive learning.

2. RRCNN: Multi-QP Network Architecture

**RRCNN: Multi-QP Network Architecture for Luma**

QP map is input together with the luma map.
The luma patch and QP map are first normalized to [0, 1] by min-max normalization:

The total depth of the network to 16 layers, including 7 residual units.
As depth increases, the performance increases, but gains improve very slowly and are small while the network goes further deeper.
The standard loss function is used:

**RRCNN: Multi-QP Network Architecture for Chroma**

A separate network for chroma filtering to achieve higher chroma reconstruction quality.
Similar to the network for luma, but with luma is downsampled to have the same size as chroma before input.
The output is U and V.

3. HEVC Implementation

Three optional positions for RRCNN.
(a) RRCNNF-I: For first position, the RRCNN is placed before DF and replaces DF and SAO.
(b) RRCNNF-II: The second position is after DF and replaces SAO.
(c) RRCNNF-III: The third position is after SAO and is employed as an additional filter.
(d): CTU-level control flag is added to let encoder to choose the best one either DF+SAO or RRCNN for filtering.

4. Experimental Results

4.1. Training

Uncompressed Colour Image Database (UCID), which consists of 1338 natural images, is used to generate the training data.
They are compressed by HM-16.16 using different QPs with DF and SAO off.

4.2. BD-Rate

**BD-Rate (%) on HEVC Test Sequences Under AI Configuration**

8.7%, 20.7% and 21.4% average BD-rate reductions are achieved for Y, U, V respectively under AI configuration, which is a large margin.

**BD-Rate (%) on HEVC Test Sequences Under RA, LDP, LDB Configurations**

Under other configurations, large BD-rate reductions can still be obtained.

4.3. Visual Quality

**Left to Right: Ground-truth, No DF & SAO, DF & SAO, and RRCNN**

Although the HEVC in-loop filtering effectively removes blocking artifacts and suppresses the slight ringing artifacts, there are still serious ringing artifacts and blur near the true edges in filtered images.
However, RRCNN not only effectively removes blocking and ringing artifacts but also recovers the details, which leads to clear images

4.3. SOTA Comparison

RRCNN outperforms VRCNN and RHCNN with large margins.

4.4. RD Curves

The gain is nearly constant from high bitrate to low bitrate.

4.5. Different Positions for RRCNN

RRCNNF-I, i.e. RRCNN replacing the conventional filters, outperforms others.

4.6. QP Adaptivity

With QP+2 and QP-2, there are still coding gains which show the robustness of RRCNN.

RRCNN-S: S means single, training one RRCNN for one QP.
RRCNN-M: M means multi, training one RRCNN for multiple QPs.
Only little loss for RRCNN-M compared with RRCNN-S but with multiple models saved.

4.7. BD-Rate for Chroma

**Average PSNR (dB) and BD-Rate (%) on HEVC Test Sequences Under AI Configuration**

With luma model trained and tested on chroma, only 11.5% and 13.8% BD-rate reduction for Cb and Cr respectively.
With chroma model trained and tested on chroma, 20.5% and 21.3% BD-rate reduction for Cb and Cr respectively.