Reading: AResNet — Attention Residual Neural Network (Codec Filtering)
2.77%, 7.01%, 8.64% BD-Rate Reduction for Luma & Both Chroma With RA Configuration for JEM
In this story, Attention Residual Neural Network (AResNet), by Hikvision Research Institute, is briefly presented. I read this because I work on video coding research. In this paper:
- In-loop filter for inter frames is proposed to completely replace all the conventional filters in the codec.
- Moreover, the greedy heuristic approach is used during training.
This is a paper in 2019 ICME. (Sik-Ho Tsang @ Medium)
Outline
- Attention Residual Neural Network (AResNet)
- Greedy Heuristic Approach
- Some Training Details
- Experimental Results
1. Attention Residual Neural Network (AResNet)
- Based on CNNF, an attention net is added, to control filter strength based on the input content and QP map.
- The attention residual network with 7 blocks is taken.
- Each block module contains a convolutional layer followed by a BN and a ReLU as the non-linear activation.
- The kernel number of each block is set to 64, the kernel size of each block is set to 3×3, except the first one, which is set to 5×5 to expand the field of the perception.
- For the last three convolutional layers, the kernel numbers are set to 256, 256 and 1 separately, which means KL equals to 256, KL+1 equals to 1.
- For I frames, CNNF is used.
2. Greedy Heuristic Approach
- To train the network, instead of standard L2 norm, authors use:
- where the x represents the original content without the compression, the ˆx represents the reconstructed content filtered by the loop filers.
- ~x represents the reconstructions filtered by the loop filter generated by the last greedy training step, which is mentioned later.
- As shown the algorithm above.
- For the first training (Steps 06 to 11): All the conventional filters of JEM 7.0 are enabled to get the reconstructed data before and after filters as training data pair. Thus, a faster convergence is achieved.
- For the next training (Steps 05 to 14): Then, disable all the filters off but the CNN filter to train for the next recurrent training.
- After each step of training, the value of λ1 is increased, and the λ2 is decreased to approximate the original content gradually.
3. Some Training Details
- 807 video sequences are compressed by using JEM 7.0 with RA configuration, with four QPs (22, 27, 32, 37).
- The model compression strategy is adopted to reduce of complexity of AResNet filter.
- First, the parameters of BN layers are merged to the convolution layer:
4. Experimental Results
4.1. BD-Rate
- With only L2 norm as loss function, 7.77% increase in BD-rate is obtained.
- AResNet achieves 7.91% BD-rate reduction compared to L2 Norm one.
- 2.77% BD-rate reduction is obtained by AResNet compared to JEM.
- With ALF on, 4.95% BD-rate reduction is even obtained by AResNet compared to JEM.
4.2. PSNR After Each Training Step
- With greedy approach of the recurrent training, the PSNR of the reconstructions get higher and higher.
4.3. Visual Quality
- AResNet obtains a better image with some artifacts removed.
This is the 24th story in this month!
Reference
[2019 ICME] [AResNet]
An Attention Residual Neural Network with Recurrent Greedy Approach as Loop Filter for Inter Frames
Codec Filtering
JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [CNNF] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [ARTN] [Double-Input CNN] [CNNIF & CNNMC] [B-DRRN] [Residual-VRN] [Liu PCS’19] [QE-CNN] [EDCNN] [VRCNN-BN] [MACNN]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19]
VVC [AResNet] [Lu CVPRW’19] [Wang APSIPA ASC’19] [ADCNN]