Reading: CNNF — Convolutional Neural Network Filter (Codec Filtering)

3.14%, 5.21% and 6.28% BD-Rate Savings for Luma and 2 Chroma Respectively Under AI Configuration

Sik-Ho Tsang
5 min readJun 15, 2020


In this paper, Convolutional Neural Network Filter (CNNF), by Hikvision Research Institute, is presented. I read this because I work on video coding research. In CNN prior arts for filtering, there are few problems:

  • Different models are applied for different QPs which is expensive for hardware.
  • Float points operation is used which leads to inconsistency between encoding and decoding across different platforms.
  • Redundancy within CNN model consumes precious computational resources.

In this paper, a single CNN with low redundancy is proposed:

  • Both reconstruction and QP are taken as inputs.
  • The obtained model is compressed to reduce redundancy.
  • To ensure consistency, dynamic fixed points (DFP) are adopted in testing CNN.

This is a paper in 2018 ICIP, and it was submitted to JVET meeting as well. (Sik-Ho Tsang @ Medium)


  1. CNNF: Network Architecture
  2. Model Compression
  3. Dynamic Fixed Point (DFP) Inference
  4. Experimental Results

1. CNNF: Network Architecture

CNNF: Network Architecture
  • CNNF includes two inputs: the reconstruction and QP map, which makes it possible to use a single set of parameters to adapt to reconstructions with different qualities. QP map is generated by QPMap(x, y) = QP.
  • Both the two inputs are normalized to [0,1] for better convergence.
  • A simple CNN with 8 convolution layers with residual learning is used, where KL is set to 64.
  • The network is like a VDSR but with QP input, BN, and shallower network.

2. Model Compression

  • For efficient compression, loss function, Loss, with two additional regularizers are included:
  • where g1 and g2 denotes L1 or L2 norm.
  • λw, λs and λlda are set to 1e-5, 5e-8 and 3e-6, respectively.
  • S is the scale parameter in BN layer. With the first additional regularizer, the learned scale parameters in BN layer tends to be zero.
  • The second additional regularizer, i.e. the linear discriminant analysis (LDA) item, makes the learned parameters friendly to the following low rank approximation.
  • Then singular value decomposition (SVD) is established for low rank approximation. After that, filters are reconstructed using a much lower basis.
Compressed filter number for each convolution layer
  • The amount of parameters is reduced to 51% of the original model.
  • Experimental results report performance only changes about -0.08%, -0.19%, 0.25% in average for Y, U and V components of class B, C, D and E on JEM 7.0.

3. Dynamic Fixed Point Inference

  • A value V in dynamic fixed point is described by:
  • where Bf denotes bit width to represent the DFP value, s the sign bit.
  • FL the fractional length and xi the mantissa binary bits.
  • Each float point within model parameters and outputs is quantized and clipped to be converted to DFP.
  • First, Bit width for weights Bw and biases Bb are set to 8 and 32, respectively.
  • For layer outputs, the bit width is set to 16.
  • Each group in the same layer shares one common FL, which is estimated from available training data and layer parameters.
Estimated FL for each convolution layer
  • FL for concat and summation layer are both set to 15.
  • Since CPU and GPU do not support DFPs, they are simulated by float points similar to [10].
  • With shorter fractional length, computation can be saved.

4. Experimental Results

4.1. Training

  • Training data: Visual genome(VG) [17], DIV2K [18] and ILSVRC2012 [19].
  • Each image is intra encoded by the QP 22, 27, 32, 37 on JEM 7.0 with BF, DF, SAO and ALF off, patches with 35×35 size.
  • Batch size M is set to 64.
  • 3.6 million training data are generated which includes 600 thousands luma data and 300 thousands chroma data for each QP.
  • Training is stopped after 32 epochs.

4.2. QP-Independent vs QP-Dependent

BD-Rate (%) on Test Sequences
  • ‘Best’: Multiple dedicated models are trained for dedicated QPs and test on the same QP
  • CNNF obtains 3.99% BD-rate reduction which is close to ‘Best’.

4.3. BD-Rate Under AI Configuration

BD-Rate (%) on Test Sequences
  • CNNF with all intra (AI) configuration, achieves 3.14%, 5.21% and 6.28% BD-rate savings for luma and both chroma components.
Visual Quality in this sub-section
  • Sharper edges for the umbrellas are observed.

4.4. BD-Rate When ALF On

BD-Rate (%) on Test Sequences Under AI Conf
BD-Rate (%) on Test Sequences Under RA Configuration
  • CNNF is only applied to intra frames. Due to inter dependency within ALF, it is not replaced. For B and P frames, filters are configured the same as JEM 7.0.
  • 3.57%, 6.17% and 7.06% average gains are observed with AI configuration.
  • Though only applied to intra frames, CNNF achieves 1.23%, 3.65% and 3.88% gains with RA configuration.

4.5. Complexity

  • With GPU, the EncT decreases and DecT increases a little.
  • Even when testing with CPU, the EncT only increases a little.
  • Though DecT is extremely high on CPU, we do believe that with the development of deep learning specific hardware it will not be a problem

This is the 23rd story in this month!



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.