Reading: Double-Input CNN — Mean-based Mask (MM)+Add-based Fusion (AF) (Codec Filtering)

With Mask Input, Outperforms VRCNN and QE-CNN.


  1. Framework Variants
  2. Double-Input CNN Network Architecture
  3. Experimental Results

1. Framework Variants

Overall Framework
  • Since the block-wise transform and quantization are performed in HEVC during encoding, the quality degradation of compressed frames is highly related to the coding unit splitting. Thus, the partition information contains useful clues for eliminating the artifacts present during the encoding.
  • And there are numerous ways to generate and fuse this information with the decoded frame.

1.1. Mask Generation

(a) original image with partition information (b) Mean-based mask (c) Boundary-based mask
  • (b) Mean-based mask (MM): Each partition block in a frame is filled with the mean value of all decoded pixels inside this partition.
  • (c) Boundary-based mask (BM): The boundary pixels between partitions are filled with value 1 and the rest non-boundary pixels are filled with value 0. The width of the boundary is set to 2.

1.2. Mask-frame Fusion Strategies

Mask-frame Fusion Strategies
  • (a) Add-based fusion (AF): First extract the feature maps of the mask using CNN and then combine it with the feature maps of the input frame using element-wise add layer.
  • (b) Concatenate-based fusion (CF): Concatenate the mask and frame as the input to the CNN. Then the two-channel image is fed to CNN.
  • (c) Early fusion (EF): First extract the features of mask only using three convolutional layers and integrate it into the network.

2. Double-Input CNN Network Architecture

Double-Input CNN Network Architecture
  • This CNN contains two streams in the feature extracting stage so as to extract features for the decoded frame and its corresponding mask, respectively.
  • Each residual block in the feature extracting stage has two convolutional layers with 3×3 kernels and 64 feature maps, followed by batch-normalization layers and ReLU, as shown in the grey block at the bottom right of the figure.
  • Then, the feature maps of the mask and decoded frame are fused by the add-based fusion strategy and are fed to the rest three convolutional layers.
  • MSE is used as loss function.

3. Experimental Results

3.1. Some Training Details

Training Dataset Samples
  • The dataset is derived from 600 video clips with various resolutions, as shown above. (But authors did not mention explicitly which dataset they use for training.)
  • All raw video clips are encoded by HM-16.0 at Low-delay P at QP=22, 27, 32, and 37.
  • An individual CNN is trained for each QP. First train the QP 37 one, then train others by fine-tuning QP 37 one.

3.2. Ablation Study

ΔPSNR Obtained by Double-Input CNN Variants
  • 1-in: No mask input, obtains the lowest PSNR improvement.
  • 2-in+BM+AF: cannot provide noticeable improvement (0.08 dB over 1-in). This is because only marking boundary pixels in a mask is less effective in highlighting the partition modes in a frame.
  • Comparatively, the concatenate-fusion (2-in+MM+CF) and early-fusion (2-in+MM+EF) strategies obtains few gains similar to 2-in+BM+AF. This is probably because these fusion strategies are less compatible with the CNN model used in this paper.
  • Comparatively, the mean-based mask (2-in+MM+AF) can obtain more obvious PSNR improvement (0.15 dB over 1-in).

3.3. SOTA Comparison

BD-Rate (%) Compared to Original HEVC
  • The full version of our approach (our+2-in+MM+AF) achieves the best performance overall the compared methods.
  • Specifically, it can obtain over 9.76% BD-rate reduction from standard HEVC and 4% BD-rate reduction when compared with the state-of-the-art QECNN.
  • When integrating our partition-mask strategy, the VRCNN+MM+AF can also obtain 3% BD-rate improvement over the original VRCNN method.
  • The baseline single-input method (our+1-in) can also obtain satisfactory results when compared with the existing methods (VRCNN, QECNN-P).



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store