Reading: EDCNN — Enhanced Deep Convolutional Neural Network (Codec Filtering)
In this story, Enhanced Deep Convolutional Neural Network (EDCNN), by anjing University of Information Science and Technology, Chinese Academy of Sciences, Sungkyunkwan University, and City University of Hong Kong, is described. In this paper, a CNN-based in-loop filter is proposed to replace the original in-loop filter (DF and SAO) in the conventional HEVC.
This is a paper in 2020 TIP where TIP has a high impact factor of 6.79. (Sik-Ho Tsang @ Medium)
Outline
- EDCNN Network Architecture
- Weight Normalization
- Feature Information Fusion Block
- Mixed MSE and MAE Loss Function
- Experimental Results
1. EDCNN Network Architecture
- The input is the picture before filtering, the output is the picture after filtering which has higher image quality.
- EDCNN consists of 7 blocks, each fusion block contains 4 convolution and ReLU layers.
- Each convolution layer has an operation of weight normalization.
- (The fusion block and weight normalization will be mentioned later.)
- The overall proposed network has 16 layers.
- The detailed network parameters are as follows:
2. Weight Normalization
- For batch normalization (BN), the output of each neuron (before application of the nonlinearity) is normalized by the mean and standard deviation of the outputs calculated over the examples in the minibatch. However, noise is added to the gradient.
- Weight normalization is to normalize the weight:
- Thus, the dependencies of mini-batch will not be introduced.
- And weight normalization can be viewed as a cheaper and less noisy approximation to BN.
- (If interested, please the paper about weight normalization: Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks. This is a paper in 2016 NIPS.)
- It is found that the loss obtained by the weight normalization is lower than the BN one.
- Thus, weight normalization is adopted.
3. Feature Information Fusion Block
3.1. 1×1 Conv Then 3×3 Conv Fusion Block
- A ResNeXt block is used as the feature information fusion block.
- (Please feel free to read my story about ResNeXt.)
- where α is the number of branch tested. It is found that α=4 obtains the highest PSNR, as shown in the table above.
3.2. 3×3 Conv Fusion Block
- Another fusion block variant is also tried which does not have 1×1 to reduce the dimensionality. Instead, the 3×3 conv alone performs both dimensionality reduction and feature extraction altogether using a larger stride.
- And it is found that the fusion block using 1×1 conv plus 3×3 conv has the better result.
3.3. With or Without Fusion Block
- NF: Network fusion block.
- NWF: Network without fusion block. (However, it is not clear in the paper that whether the whole fusion block is removed from the network, or it is replaced by a 3×3 conv, or just α=1.)
- Of course, as shown above, NF is better than NWF.
4. Mixed MSE and MAE Loss Function
4.1. Mean Square Error (MSE) loss
- However, MSE will over penalize the errors by the square, and it has been proved that the MSE cannot capture the intricate characteristics of the HVS.
4.2. Mean Absolute Error (MAE) loss
- The network is easier to obtain the precise results due to the MAE is not sensitive to the outlier. However, the MAE is hard to descend.
4.3. Mixed MSE and MAE Loss
- where δ is an adaptive parameter according to loss convergence.
- where N is the number of continuous epoch, and it equals to 3; c represents the number of current epoch; L is the loss value; ξ is the threshold, which is used to control the performance of the loss function.
- To come up with the optimal ξ, a group of ξ values from 0.009 to 0.018 are tested as below:
- It is found that ξ=0.015 has the best performance.
- Among MSE, MAE and the proposed mixed loss, the proposed one obtains the highest PSNR.
- The zoomed regions of proposed loss function has the best performances with less artifacts.
4. Experimental Results
4.1. BDBR (BD-Rate)
- The BDBR reduction by EDCNN is from 1.77% to 12.06%, and 6.27% on average using low delay configuration.
- And the BDBR reduction by EDCNN is from 0.41% to 12.31%, and 6.62% on average using random access configuration.
- EDCNN outperforms SRResNet [26] and RHCNN [23] for both configurations.
4.2. Visual Quality
- The compared areas in HM16.9 have obvious artifacts, including ringing artifacts and color excursion.
- The other two algorithms can reduce most of artifacts, however, some obvious artifacts are still there. SRResNet [26] still contains some blocking artifacts while RHCNN [23] makes the image become more blurring, and lots of details in image crops are eliminated.
4.3. Computational Complexity
- On average, the proposed EDCNN increases the encoder complexity by 172% and 247% using low delay and random access configurations respectively.
4.4. Model Size and GPU Memory
- The model size and GPU memory of the proposed EDCNN are 18.2 MB and 5193 MB respectively which are both smaller than RHCNN.
During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 23rd story in this month. Thanks for visiting my story..
Reference
[2020 TIP] [EDCNN]
Efficient In-Loop Filtering Based on Enhanced Deep Convolutional Neural Networks for HEVC
Codec Filtering
JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [Liu PCS’19] [QE-CNN] [EDCNN]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19]
VVC [Lu CVPRW’19] [Wang APSIPA ASC’19]