Reading: CNNF — Convolutional Neural Network Filter (Codec Filtering)
3.14%, 5.21% and 6.28% BD-Rate Savings for Luma and 2 Chroma Respectively Under AI Configuration
--
In this paper, Convolutional Neural Network Filter (CNNF), by Hikvision Research Institute, is presented. I read this because I work on video coding research. In CNN prior arts for filtering, there are few problems:
- Different models are applied for different QPs which is expensive for hardware.
- Float points operation is used which leads to inconsistency between encoding and decoding across different platforms.
- Redundancy within CNN model consumes precious computational resources.
In this paper, a single CNN with low redundancy is proposed:
- Both reconstruction and QP are taken as inputs.
- The obtained model is compressed to reduce redundancy.
- To ensure consistency, dynamic fixed points (DFP) are adopted in testing CNN.
This is a paper in 2018 ICIP, and it was submitted to JVET meeting as well. (Sik-Ho Tsang @ Medium)
Outline
- CNNF: Network Architecture
- Model Compression
- Dynamic Fixed Point (DFP) Inference
- Experimental Results
1. CNNF: Network Architecture
- CNNF includes two inputs: the reconstruction and QP map, which makes it possible to use a single set of parameters to adapt to reconstructions with different qualities. QP map is generated by QPMap(x, y) = QP.
- Both the two inputs are normalized to [0,1] for better convergence.
- A simple CNN with 8 convolution layers with residual learning is used, where KL is set to 64.
- The network is like a VDSR but with QP input, BN, and shallower network.
2. Model Compression
- For efficient compression, loss function, Loss, with two additional regularizers are included:
- where g1 and g2 denotes L1 or L2 norm.
- λw, λs and λlda are set to 1e-5, 5e-8 and 3e-6, respectively.
- S is the scale parameter in BN layer. With the first additional regularizer, the learned scale parameters in BN layer tends to be zero.
- The second additional regularizer, i.e. the linear discriminant analysis (LDA) item, makes the learned parameters friendly to the following low rank approximation.
- Then singular value decomposition (SVD) is established for low rank approximation. After that, filters are reconstructed using a much lower basis.
- The amount of parameters is reduced to 51% of the original model.
- Experimental results report performance only changes about -0.08%, -0.19%, 0.25% in average for Y, U and V components of class B, C, D and E on JEM 7.0.
3. Dynamic Fixed Point Inference
- A value V in dynamic fixed point is described by:
- where Bf denotes bit width to represent the DFP value, s the sign bit.
- FL the fractional length and xi the mantissa binary bits.
- Each float point within model parameters and outputs is quantized and clipped to be converted to DFP.
- First, Bit width for weights Bw and biases Bb are set to 8 and 32, respectively.
- For layer outputs, the bit width is set to 16.
- Each group in the same layer shares one common FL, which is estimated from available training data and layer parameters.
- FL for concat and summation layer are both set to 15.
- Since CPU and GPU do not support DFPs, they are simulated by float points similar to [10].
- With shorter fractional length, computation can be saved.
4. Experimental Results
4.1. Training
- Training data: Visual genome(VG) [17], DIV2K [18] and ILSVRC2012 [19].
- Each image is intra encoded by the QP 22, 27, 32, 37 on JEM 7.0 with BF, DF, SAO and ALF off, patches with 35×35 size.
- Batch size M is set to 64.
- 3.6 million training data are generated which includes 600 thousands luma data and 300 thousands chroma data for each QP.
- Training is stopped after 32 epochs.
4.2. QP-Independent vs QP-Dependent
- ‘Best’: Multiple dedicated models are trained for dedicated QPs and test on the same QP
- CNNF obtains 3.99% BD-rate reduction which is close to ‘Best’.
4.3. BD-Rate Under AI Configuration
- CNNF with all intra (AI) configuration, achieves 3.14%, 5.21% and 6.28% BD-rate savings for luma and both chroma components.
- Sharper edges for the umbrellas are observed.
4.4. BD-Rate When ALF On
- CNNF is only applied to intra frames. Due to inter dependency within ALF, it is not replaced. For B and P frames, filters are configured the same as JEM 7.0.
- 3.57%, 6.17% and 7.06% average gains are observed with AI configuration.
- Though only applied to intra frames, CNNF achieves 1.23%, 3.65% and 3.88% gains with RA configuration.
4.5. Complexity
- With GPU, the EncT decreases and DecT increases a little.
- Even when testing with CPU, the EncT only increases a little.
- Though DecT is extremely high on CPU, we do believe that with the development of deep learning specific hardware it will not be a problem
This is the 23rd story in this month!
Reference
[2018 ICIP] [CNNF]
A practical convolutional neural network as loop filter for intra frame
Codec Filtering
JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [CNNF] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [ARTN] [Double-Input CNN] [CNNIF & CNNMC] [B-DRRN] [Residual-VRN] [Liu PCS’19] [QE-CNN] [EDCNN] [VRCNN-BN] [MACNN]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19]
VVC [Lu CVPRW’19] [Wang APSIPA ASC’19] [ADCNN]