# Reading: CNNF — Convolutional Neural Network Filter (Codec Filtering)

## 3.14%, 5.21% and 6.28% BD-Rate Savings for Luma and 2 Chroma Respectively Under AI Configuration

In this paper, **Convolutional Neural Network Filter (CNNF)**, by Hikvision Research Institute, is presented. I read this because I work on video coding research. In **CNN prior arts for filtering**, there are few problems:

- Different
**models**are applied for different QPs which is**expensive for hardware.** **Float points operation**is used which leads to**inconsistency**between encoding and decoding across different platforms.**Redundancy within CNN model**consumes precious computational resources.

In this paper, a single CNN with low redundancy is proposed:

**Both reconstruction and QP are taken as inputs.**- The obtained
**model is compressed**to reduce redundancy. - To ensure consistency,
**dynamic fixed points (DFP) are adopted**in testing CNN.

This is a paper in **2018 ICIP**, and it was submitted to JVET meeting as well. (Sik-Ho Tsang @ Medium)

# Outline

**CNNF: Network Architecture****Model Compression****Dynamic Fixed Point (DFP) Inference****Experimental Results**

**1. CNNF: Network Architecture**

- CNNF includes two inputs: the reconstruction and QP map, which makes it possible to use a single set of parameters to adapt to reconstructions with different qualities. QP map is generated by QPMap(x, y) = QP.
- Both the two inputs are normalized to [0,1] for better convergence.
- A simple CNN with 8 convolution layers with residual learning is used, where
*KL*is set to 64. - The network is like a VDSR but with QP input, BN, and shallower network.

# 2. Model Compression

- For efficient compression, loss function,
*Loss*, with two additional regularizers are included:

- where
*g*1 and*g*2 denotes L1 or L2 norm. *λw*,*λs*and*λlda*are set to 1e-5, 5e-8 and 3e-6, respectively.*S*is the scale parameter in BN layer.**With the first additional regularizer**,**the learned scale parameters in BN layer tends to be zero.****The second additional regularizer, i.e. the linear discriminant analysis (LDA) item**, makes the learned parameters friendly to the following low rank approximation.- Then singular value decomposition (SVD) is established for low rank approximation. After that, filters are reconstructed using a much lower basis.

- The amount of parameters is reduced to 51% of the original model.
- Experimental results report performance
**only changes about -0.08%, -0.19%, 0.25% in average for Y, U and V components**of class B, C, D and E on JEM 7.0.

# 3. Dynamic Fixed Point Inference

- A value V in dynamic fixed point is described by:

- where
*Bf*denotes bit width to represent the DFP value,*s*the sign bit. and*FL*the fractional length*xi*the mantissa binary bits.- Each float point within model parameters and outputs is quantized and clipped to be converted to DFP.
- First, Bit width for weights
*Bw*and biases*Bb*are set to 8 and 32, respectively. - For layer outputs, the bit width is set to 16.
**Each group in the same layer shares one common FL, which is estimated from available training data and layer parameters.**

*FL*for concat and summation layer are both set to 15.- Since CPU and GPU do not support DFPs, they are simulated by float points similar to [10].
**With shorter fractional length, computation can be saved.**

# 4. Experimental Results

## 4.1. Training

- Training data: Visual genome(VG) [17], DIV2K [18] and ILSVRC2012 [19].
- Each image is intra encoded by the QP 22, 27, 32, 37 on JEM 7.0 with BF, DF, SAO and ALF off, patches with 35×35 size.
- Batch size
*M*is set to 64. - 3.6 million training data are generated which includes 600 thousands luma data and 300 thousands chroma data for each QP.
- Training is stopped after 32 epochs.

## 4.2. QP-Independent vs QP-Dependent

**‘Best’**: Multiple dedicated models are trained for dedicated QPs and test on the same QP- CNNF obtains 3.99% BD-rate reduction which is close to ‘Best’.

## 4.3. BD-Rate Under AI Configuration

- CNNF with all intra (AI) configuration, achieves
**3.14%, 5.21% and 6.28% BD-rate savings**for luma and both chroma components.

- Sharper edges for the umbrellas are observed.

## 4.4. BD-Rate When ALF On

**CNNF is only applied to intra frames.**Due to inter dependency within ALF, it is not replaced. For B and P frames, filters are configured the same as JEM 7.0.**3.57%, 6.17% and 7.06% average gains are observed with AI configuration.**- Though only applied to intra frames,
**CNNF achieves 1.23%, 3.65% and 3.88% gains with RA configuration.**

## 4.5. Complexity

**With GPU, the EncT decreases and DecT increases a little.**- Even when testing
**with CPU, the EncT only increases a little.** - Though
**DecT is extremely high on CPU**, we do believe that with the development of deep learning specific hardware it will not be a problem

This is the 23rd story in this month!

## Reference

[2018 ICIP] [CNNF]

A practical convolutional neural network as loop filter for intra frame

## Codec Filtering

**JPEG** [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]**HEVC **[Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [CNNF] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [ARTN] [Double-Input CNN] [CNNIF & CNNMC] [B-DRRN] [Residual-VRN] [Liu PCS’19] [QE-CNN] [EDCNN] [VRCNN-BN] [MACNN]**3D-HEVC **[RSVE+POST]**AVS3 **[Lin PCS’19]**VVC** [Lu CVPRW’19] [Wang APSIPA ASC’19] [ADCNN]