Reading: RRCNN — Recursive Residual Convolutional Neural Network (Coded Filtering)
Outperforms VRCNN, RHCNN & CNNF. 8.7% Average BD-Rate Reduction for Luma. More Than 20% Average BD-Rate Reductions for Chroma.
In this story, Recursive Residual Convolutional Neural Network (RRCNN), by Tianjin University, and Santa Clara University, is presented. I read this paper because I work on video coding research. In this paper:
- Residual: Shortcut connections are used to skip a few stacked layers in CNN.
- Recursive: The same set of weights are used recursively thus fewer number of parameters.
- A single model is trained for various bitrate settings.
This is an early access article in 2019 TCSVT where TCSVT has the high impact factor of 4.046. It will be officially published in the future. (Sik-Ho Tsang @ Medium)
Outline
- Residual Learning & Recursive Learning
- RRCNN: Multi-QP Network Architecture
- HEVC Implementation
- Experimental Results
1. Residual Learning & Recursive Learning
- For RRCNN in (d), a skip-connection is added in the entire network from the beginning to the end, termed External Residual Learning (ERL).
- Also, an identity skip connection is attached in a few stacked layers, termed Internal Residual Learning (IRL).
- Within the green dashed-line boxes, the convolutional weights in yellow & green are sharing the same weights respectively. This can help to keep the model parameters from increasing.
- As shown above, Pre-Activation ResNet is used where BN and ReLU are performed before convolution.
- Different networks as shown in the first figure with different network depths are tested.
- Better performance can be achieved when the network goes deeper because a deep network guarantees strong learning ability.
- Furthermore, RRCNN achieves the best performance at all depths and outperforms the second-best VDSR by 0.05dB (BD-PSNR) at a depth of 22 while utilizing 10 times fewer parameters, which verifies the effectiveness of the multi-path structure and recursive learning.
2. RRCNN: Multi-QP Network Architecture
- QP map is input together with the luma map.
- The luma patch and QP map are first normalized to [0, 1] by min-max normalization:
- The total depth of the network to 16 layers, including 7 residual units.
- As depth increases, the performance increases, but gains improve very slowly and are small while the network goes further deeper.
- The standard loss function is used:
- A separate network for chroma filtering to achieve higher chroma reconstruction quality.
- Similar to the network for luma, but with luma is downsampled to have the same size as chroma before input.
- The output is U and V.
3. HEVC Implementation
- Three optional positions for RRCNN.
- (a) RRCNNF-I: For first position, the RRCNN is placed before DF and replaces DF and SAO.
- (b) RRCNNF-II: The second position is after DF and replaces SAO.
- (c) RRCNNF-III: The third position is after SAO and is employed as an additional filter.
- (d): CTU-level control flag is added to let encoder to choose the best one either DF+SAO or RRCNN for filtering.
4. Experimental Results
4.1. Training
- Uncompressed Colour Image Database (UCID), which consists of 1338 natural images, is used to generate the training data.
- They are compressed by HM-16.16 using different QPs with DF and SAO off.
4.2. BD-Rate
- 8.7%, 20.7% and 21.4% average BD-rate reductions are achieved for Y, U, V respectively under AI configuration, which is a large margin.
- Under other configurations, large BD-rate reductions can still be obtained.
4.3. Visual Quality
- Although the HEVC in-loop filtering effectively removes blocking artifacts and suppresses the slight ringing artifacts, there are still serious ringing artifacts and blur near the true edges in filtered images.
- However, RRCNN not only effectively removes blocking and ringing artifacts but also recovers the details, which leads to clear images
4.3. SOTA Comparison
4.4. RD Curves
- The gain is nearly constant from high bitrate to low bitrate.
4.5. Different Positions for RRCNN
- RRCNNF-I, i.e. RRCNN replacing the conventional filters, outperforms others.
4.6. QP Adaptivity
- With QP+2 and QP-2, there are still coding gains which show the robustness of RRCNN.
- RRCNN-S: S means single, training one RRCNN for one QP.
- RRCNN-M: M means multi, training one RRCNN for multiple QPs.
- Only little loss for RRCNN-M compared with RRCNN-S but with multiple models saved.
4.7. BD-Rate for Chroma
- With luma model trained and tested on chroma, only 11.5% and 13.8% BD-rate reduction for Cb and Cr respectively.
- With chroma model trained and tested on chroma, 20.5% and 21.3% BD-rate reduction for Cb and Cr respectively.
- Compared with CNNF [31], large coding gains are observed for Cr and Cb using RRCNN.
4.8. Computational Complexity
- With GPU, the encoding time and decoding time are increased.
- With only CPU, the encoding time and decoding time are significantly increased.
This is the 30th story in this month!
Reference
[2019 TCSVT] [RRCNN] (Early Access, Maybe 2020 Officially Published)
Recursive Residual Convolutional Neural Network- Based In-Loop Filtering for Intra Frames
Codec Filtering
JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [CNNF] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [ARTN] [Double-Input CNN] [CNNIF & CNNMC] [B-DRRN] [Residual-VRN] [AResNet] [Liu PCS’19] [DIA_Net] [RRCNN] [QE-CNN] [EDCNN] [VRCNN-BN] [MACNN]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19]
VVC [Lu CVPRW’19] [Wang APSIPA ASC’19] [ADCNN]