Reading: EDCNN — Enhanced Deep Convolutional Neural Network (Codec Filtering)

Using ResNeXt Block, 6.45% BDBR Reduction, Outperforms SRResNet and RHCNN

EDCNN is proposed to replace the original in-loop filter (DF & SAO) in HEVC

In this story, Enhanced Deep Convolutional Neural Network (EDCNN), by anjing University of Information Science and Technology, Chinese Academy of Sciences, Sungkyunkwan University, and City University of Hong Kong, is described. In this paper, a CNN-based in-loop filter is proposed to replace the original in-loop filter (DF and SAO) in the conventional HEVC.

This is a paper in 2020 TIP where TIP has a high impact factor of 6.79. (Sik-Ho Tsang @ Medium)


  1. EDCNN Network Architecture
  2. Weight Normalization
  3. Feature Information Fusion Block
  4. Mixed MSE and MAE Loss Function
  5. Experimental Results

1. EDCNN Network Architecture

EDCNN Network Architecture
  • The input is the picture before filtering, the output is the picture after filtering which has higher image quality.
  • EDCNN consists of 7 blocks, each fusion block contains 4 convolution and ReLU layers.
  • Each convolution layer has an operation of weight normalization.
  • (The fusion block and weight normalization will be mentioned later.)
  • The overall proposed network has 16 layers.
  • The detailed network parameters are as follows:
EDCNN Network Parameters

2. Weight Normalization

  • For batch normalization (BN), the output of each neuron (before application of the nonlinearity) is normalized by the mean and standard deviation of the outputs calculated over the examples in the minibatch. However, noise is added to the gradient.
  • Weight normalization is to normalize the weight:
Loss of Weight Normalization and Batch Normalization
  • It is found that the loss obtained by the weight normalization is lower than the BN one.
  • Thus, weight normalization is adopted.

3. Feature Information Fusion Block

3.1. 1×1 Conv Then 3×3 Conv Fusion Block

  • A ResNeXt block is used as the feature information fusion block.
  • (Please feel free to read my story about ResNeXt.)
Number of branches α
  • where α is the number of branch tested. It is found that α=4 obtains the highest PSNR, as shown in the table above.

3.2. 3×3 Conv Fusion Block

Another Fusion Block Variant
  • Another fusion block variant is also tried which does not have 1×1 to reduce the dimensionality. Instead, the 3×3 conv alone performs both dimensionality reduction and feature extraction altogether using a larger stride.
  • And it is found that the fusion block using 1×1 conv plus 3×3 conv has the better result.

3.3. With or Without Fusion Block

  • NF: Network fusion block.
  • NWF: Network without fusion block. (However, it is not clear in the paper that whether the whole fusion block is removed from the network, or it is replaced by a 3×3 conv, or just α=1.)
  • Of course, as shown above, NF is better than NWF.

4. Mixed MSE and MAE Loss Function

4.1. Mean Square Error (MSE) loss

  • However, MSE will over penalize the errors by the square, and it has been proved that the MSE cannot capture the intricate characteristics of the HVS.

4.2. Mean Absolute Error (MAE) loss

  • The network is easier to obtain the precise results due to the MAE is not sensitive to the outlier. However, the MAE is hard to descend.

4.3. Mixed MSE and MAE Loss

  • where δ is an adaptive parameter according to loss convergence.
  • where N is the number of continuous epoch, and it equals to 3; c represents the number of current epoch; L is the loss value; ξ is the threshold, which is used to control the performance of the loss function.
  • To come up with the optimal ξ, a group of ξ values from 0.009 to 0.018 are tested as below:
  • It is found that ξ=0.015 has the best performance.
  • Among MSE, MAE and the proposed mixed loss, the proposed one obtains the highest PSNR.
(a) Ground Truth, (b) MSE, (c) MAE, (d) Proposed Mixed MSE and MAE Loss
  • The zoomed regions of proposed loss function has the best performances with less artifacts.

4. Experimental Results

4.1. BDBR (BD-Rate)

BDBR (BD-Rate) (%) and BDPSNR (dB) Using Low Delay Configuration
BDBR (BD-Rate) (%) and BDPSNR (dB) Using Random Access Configuration
  • The BDBR reduction by EDCNN is from 1.77% to 12.06%, and 6.27% on average using low delay configuration.
  • And the BDBR reduction by EDCNN is from 0.41% to 12.31%, and 6.62% on average using random access configuration.
  • EDCNN outperforms SRResNet [26] and RHCNN [23] for both configurations.

4.2. Visual Quality

Visual Quality
  • The compared areas in HM16.9 have obvious artifacts, including ringing artifacts and color excursion.
  • The other two algorithms can reduce most of artifacts, however, some obvious artifacts are still there. SRResNet [26] still contains some blocking artifacts while RHCNN [23] makes the image become more blurring, and lots of details in image crops are eliminated.

4.3. Computational Complexity

Computational Complexity (%) Against Original HEVC
  • On average, the proposed EDCNN increases the encoder complexity by 172% and 247% using low delay and random access configurations respectively.

4.4. Model Size and GPU Memory

  • The model size and GPU memory of the proposed EDCNN are 18.2 MB and 5193 MB respectively which are both smaller than RHCNN.

During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 23rd story in this month. Thanks for visiting my story..



Get the Medium app