Review — MRRN: Quality Enhancement Network via Multi-Reconstruction Recursive Residual Learning for Video Coding (HEVC Filtering)

Outperforms VRCNN, RHCNN and QE-CNN with Fewer Parameters

Sik-Ho Tsang
5 min readMar 27, 2021

In this story, Quality Enhancement Network via Multi-Reconstruction Recursive Residual Learning for Video Coding, (MRRN), by Shanghai University, is reviewed. In this paper:

  • A multi-reconstruction recurrent residual network (MRRN) is proposed to capture the multi-scale similarity of compression artifact, which outputs images with different denoise ratios and with adaptive fusing.

This is a paper in 2019 SPL (IEEE Signal Processing Letters) where SPL has a high impact factor of 3.105. (Sik-Ho Tsang @ Medium)

Outline

  1. MRRN: Network Architecture
  2. Recursive Residual Structure
  3. Experimental Results

1. MRRN: Network Architecture

MRRN: Network Architecture
  • MRRN consists of four modules including feature extraction, feature enhancement, mapping and reconstruction, as shown above.
  • The feature extraction module represents the input image x as several high dimension feature maps.
  • A a relatively larger convolutional layer is adopted in the feature extraction module to enlarge the receptive field:
  • F0 is delivered to the feature enhancement and mapping module to further extract valuable features for reconstruction.
  • where r means the iteration of the recursive structure and the mapping module has the similar formulations.
  • To capturing multi-scale similarity, the variable-size filters in VRCNN, are introduced, and improved by replacing the large filters with several corresponding small filters to reduce parameter cost and improve expression ability.
  • After mapping noise features to clear feature maps which contain most valuable information for restoration, reconstruction module recursively rebuilds several clear images with different denoise ratios and fuses them to generate a high-quality reconstruction.
  • where Fr3 is the reconstruction of recursion r, Ffinal is the final output frame and Wr is the fusion weights which are automatically learned by network.
The comparison of intermediate reconstructions and fusion outputs.
  • (a): Among different intermediate reconstructions, the third recursion achieves the best performance.
  • (b): The increase of recursion times enlarges the receptive field. When the recursion time is larger than 5, no more performance increase is observed.
  • The recursion time is set as 5 which is a balanced choice between efficiency and effectiveness.
  • The detailed architecture is shown below:
MRRN: Network Architecture

2. Recursive Residual Structure

(a) Residual structure; (b) Recursive structure; (c) Proposed Recursive Residual Structure
  • (a) Residual Structure: Residual learning with a single short-cut does not have the ability to enlarge receptive field.
  • (b) Recursive Structure: The convolution shares the same weights among different convolutional layers to reduce memory cost which can easily suffer gradient vanishing or explosion.
  • (c) Recursive Residual Structure: Features of different levels are merged to improve the learning performance with a fast convergence speed and a large receptive field.
PSNR gains under LDP configurations with QP = 37
  • It is shown that the proposed recursive residual structure has the highest enhancement performance.

3. Experimental Results

3.1. Training

  • For I-frames, a collection of 300 images in BSDS500 is chosen for training, and other 200 images for testing. Each image extends to 8 images using folding, rotation and mirroring.
  • For inter frames, 18 videos with different resolutions are chosen from the website [20] as the dataset.
  • 14 videos are chosen as train set, and other 4 are as test set.
  • At QP 22, 27, and 32, the networks are initialized from the network of QP 37.

3.1. BD-Rate

BD-Rate Under AI, LDP, LDB Configurations
  • MRRN averagely achieves 6.7%, 7.8% and 7.6% BD-rate reduction for AI, LDP and LDB configurations, respectively.
  • Experimental results are compared with other three advanced methods including VRCNN [11], RHCNN [12] and QE-CNN [10], both of which are retrained and implemented on HM16.12.

3.3. Complexity Analyses

Model Size Comparison
  • Only the parameter numbers of networks designed for enhancing P frames are listed, since inter coding is more practical used in application.
  • It can be observed that MRRN not only achieves the best enhancement performance but also has the smallest model complexity compared with other methods.
The average encoding/decoding complexity ratio of different methods
  • It can be observed that MRRN achieves a lower computation cost than RHCNN [12].
  • Although MRRN is a CNN-based network, it still has an acceptable computation cost compared to NALF [5] without the GPU acceleration.
  • The ratio of the coding time to the original time for LD is much less than that of AI, since LD configuration consumes more encoding time than AI.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.