Review: VRCNN — Variable-Filter-Size Residue-Learning CNN (Codec Filtering)

Achieved 4.6% Average BD-Rate Reduction Compared to HEVC Baseline, Outperforms ARCNN and VDSR

4 min readAug 4, 2019

In this story, Variable-Filter-Size Residue-Learning CNN (VRCNN), by University of Science and Technology of China, is reviewed. VRCNN is used as in-loop or out-of-loop filtering to improve the image quality after decoding so as to increase the coding efficiency. It is published in 2017 MMM with more than 70 citations. (Sik-Ho Tsang @ Medium)

Outline

VRCNN Network Architecture
Experimental Results

1. VRCNN Network Architecture

VRCNN is modified from ARCNN.
The second layer of ARCNN ( fixed 7×7 filters) is replaced with the combination of 5×5 and 3×3 filters. The outputs of different-sized filters are concatenated to be fed into the next layer.
Variable filter size in the third layer is also adopted that performs “restoration” of features. The fixed 1×1 filters in ARCNN are replaced by combination of 3×3 and 1×1 filters.
The variable size filter technique is originated from GoogLeNet / Inception-v1.
The first and the last layers of VRCNN do not use variable filter size, because these two layers perform feature extraction and final reconstruction, respectively.
Besides, different from ARCNN, residue learning (concept of ResNet) is also adopted since, in the case of artifact reduction, the input (before filtering) and the output (after filtering) shall be similar to the other to a large extent, therefore, learning the difference between them can be easier and more robust.

Mean Square Error (MSE) is used as loss function.
Unlike Sample Adaptive Offset (SAO) filter used in HEVC, VRCNN no needs extra signalling bits for parameters.
VRCNN can be made in-loop or out-of-loop.

2. Experimental Results

**BD-Rate Reduction Compared with HEVC Baseline**

4.6% average Y BD-Rate reduction compared to HEVC baseline.
Up to 7.6% Y BD-Rate reduction for RaceHorses sequence.

Authors also trained the ARCNN and VDSR.
ARCNN obtain 0.2% increase in Y BD-rate which means even worse.
VDSR can achieve 3.8% average Y BD-rate reduction only.

VRCNN and ARCNN are both 4-layer.
VRCNN is slightly slower because in the second and third layers there are filters of different sizes, causing some troubles for parallel computing.

Memory cost is an important issue for post-processing especially at the decoder side.
VRCNN is much shallower than VDSR and also has much less parameters than ARCNN.