Review: MLSDRN — Multi-channel Long Short-term Dependency Residual Network (Codec Filtering)

DenseNet/MemNet-Like Architecture, Up to 15.9% BD-Rate Reduction

Sik-Ho Tsang
6 min readApr 27, 2020

In this story, Multi-channel Long Short-term Dependency Residual Network (MLSDRN), by The Hong Kong University of Science and Technology and University of Electronic Science and Technology of China, is reviewed. In this paper:

  • MLSDRN introduces an update cell to adaptively store and select the long-term and short-term dependency information through an adaptive learning process.
  • It also leverages the block boundary information that recorded in the bit-streams to improve the filter performance.

This is a paper in 2018 DCC. (Sik-Ho Tsang @ Medium)

Outline

  1. Network Architecture
  2. Update Cell (UCell)
  3. Block Boundary Information
  4. Experimental Results

1. Network Architecture

MLSDRN Network Architecture
  • As shown above, the proposed MLSDRN consists of three parts: a feature extraction net (FENet), multi-channel update cells (UCells) with stacked dense connection, originated in DenseNet, and a reconstruction fusion network.
  • (The network architecture is also similar to MemNet which is used for image denoising, super resolution, and JPEG deblocking. But now, MLSDRN is used as a HEVC in-loop filter.)
  • FENet: A convolutional layer with different kernel size for each channel is used in FENet to extract features from the input image x.
  • (But the paper did not mention in details about FENet.)
  • UCells: Supposing d UCells are stacked to act as feature mapping in one channel, we have:
  • where Ud−1 and Ud are the input and output of the dth UCell.
  • Reconstruction Fusion Network: The outputs from each UCell of all channels are sent to the reconstruction fusion net to generate the intermediate outputs.
  • Finally, the final output can be generated by weighted averaging the intermediate outputs.

2. Update Cell (UCell)

Update Cell (UCell)
  • As shown above, the blue dashed box denotes the recursive unit which generates the short-term dependency information.
  • The green row denotes all previous UCell information, which is directly passed to the update gate to generate long-term dependency information.
  • The update gate adaptively remembers and forgets the past information.
  • Variable Filter Size: is used to provide multi-scale information, which suits for HEVC variable block size transform. This is originated in VRCNN, Kernel sizes 5×5, 3×3 and 1×1 for 3DConv_a, 2DConv_b and 1DConv_c in each recursion respectively.
  • Dilated Filter: is used to enlarge receptive field. This is originated in DilatedNet or DeepLab series. Directly increasing the filter size would not only introduce more parameters but also increase the computational burden. The dilation factor s for filters 3DConv_a, 2DConv_b and 1DConv_c in each recursion are set to s =3, 2, and 1, respectively.
  • Update Gate: is used to adaptively select dependency information. Each recursion Mk_conv in the recursive unit is a residual block:
  • where Hr-1d and Hrd to denote the input and output of the rth recursion. in the dth UCell. Each residual function in the recursive unit contains a convolutional layer with variable kernel size and pre-activation (Pre-Activation ResNet) structure:
  • Suppose there are L recursions in one UCell, the lth recursion in recursive unit:
  • And all the outputs are:
  • These are the multi-level representations of the recursive unit in the dth UCell.
  • The update gate in each UCell receives the long-term dependency information from previous UCells of the same channel and short-term dependency information from current recursive unit:
  • (But the paper did not mention in details about what gate function is.)
  • The loss function is a weighted sum of SSIM loss and l1 norm of y.
  • where SSIM is Structural SIMilarity to measure the frame quality, higher SSIM, better quality. SSIM loss is 1-SSIM to calculate the loss. And here multi-scale SSIM (MSSIM) is used. (For the loss function details, please feel free to read the paper.)

3. Block Boundary Information

  • Since HEVC only applies Deblocking Filter (DF) to samples adjacent to a PU or TU boundary. As a result, more pixels will be changed in the edge or sharping area, and less pixels will be changed in the smooth area.
  • Therefore, this boundary information of TUs’ and CUs’ in the bit-streams is used to further improve the filter performance and guarantee the robustness for our model.
  • (But the paper did not mention in details about how this information can help within the network.)

4. Experimental Results

4.1. Dataset & Training

  • Hollywood2 Scenes dataset is used, which is one of the most famous datasets in the video community, having 570 training sequences and 582 test sequences.
  • Only take the middle 30 consecutive frames are taken for compression for generating training data.
  • They are compressed by HM reference software HM-7.0 at four different quantization parameters (QPs): 22, 27, 32, and 37 under AI, LD and RA configurations, respectively.
  • Training data is extracted with QP−1, QP and QP +1, respectively, to train our models at each QP, as shown in the first figure.
  • The models are trained on Y channel only and they are recovered in YUV three channels.
  • The trained model is between DF and SAO.

4.2. Results

BD-rate results of the proposed MLSDRN in sequence by sequence
  • MLSDRN can improve the coding efficiency, which can achieve 6.0%, 7.4% and 8.1% bit-rate saving overall for the luminance component on all the test sequences under AI, RA and LD configurations, respectively.
BD-rate results comparison with NALF
  • Compared with the NALF, MLSDRN also has the appealing performance.
Subjective image quality comparison
  • MLSDRN produces much sharper edges than HEVC baseline, and without obvious ringing and blocking artifacts.
  • The above table shows the average enc/dec complexity ratio by utilizing our method on Intel Xeon E5–2620V4 CPU, and GeForce GTX 1080Ti GPU.
  • Overall, MLSDRN is faster than NALF, and our method is more easily to implement on GPU.

During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 29th story in this month. Thanks for visiting my story…

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet