Reading: CNNIF & CNNMC — Image Compression Using VVC, 1st & 2nd Places in CVPR 2018 CLIC (Codec Filtering & Intra Prediction)

27.90% to 30.80% Average BD-Rate Reduction Compared with BPG (Better Portable Graphics)

Sik-Ho Tsang
6 min readMay 27, 2020

In thi story, image compression using CNN based in-loop filter (CNNIF) and CNN based mode coding (CNNMC), by Wuhan University, and Tencent Media Lab, is presented. I read this because I work on video coding research. In this paper:

  • Two CNN-based approaches, CNNIF and CNNMV are combined into the hybrid video codec JEM-7.1, which formed as “iipTiramisu” solution.
  • A simplified version is also proposed which formed as “iipTiramisuS” solution.
  • Finally, the proposed approaches participated the CVPR 2018 Challenge on Learned Image Compression (CLIC), and it is claimed that they are ranked №1 and 2 respectively on the leaderboard. (And I found they are ranked №1 and 3 now.)

This is a paper in 2018 CVPRW (CVPR Workshop). (Sik-Ho Tsang @ Medium)

Outline

  1. Codec Implementation
  2. CNN based In-loop Filter (CNNIF)
  3. CNN based Mode Coding (CNNMC)
  4. Uncertainty based Resource Allocation (UNRA)
  5. Experimental Results

1. Codec Implementation

Hybrid Video Coder With CNNMC and CNN In-Loop Filter (CNNIF)
  • CNN based mode coding (CNNMC): It is implemented at the intra prediction, to improve the coding efficiency of the intra prediction mode.
  • CNN based in-loop filter (CNNIF): It is implemented to improve the reconstructed/decoded image before output.

2. CNN based In-loop Filter (CNNIF)

2.1. Dense Residual Unit (DRU)

Dense Residual Unit (DRU)
  • DRU is similar to the residual unit, originated in ResNet, as shown above.
  • The outputs of the first and the last convolutional layers are firstly added and then concatenated with the original input to generate the final output of the unit.
  • Each DRU receives the outputs from all preceding units, which is similar to the idea in DenseNet.
  • The 1×1 convolutional layer is treated as the bottleneck layer for saving computational resources. No activation layer is appended after the bottleneck layer.
  • This 1×1 convolutional layer inside each unit plays a role of combining the information from all those inputs by weights and reducing the number of parameters of the network model at the same time.
  • Batch normalization (BN) is removed since it normalizes the input signals and may lead to difference between the input and the target.
  • Stacked feature maps and dimension reduction achieved by 1×1 convolutional layer are the key of keeping balance between promoting filtering performance and saving computational resources.

2.2. CNNIF Network Architecture

CNNIF Network Architecture
  • N and M denote the number of DRU and the feature maps respectively.
  • The input of the network is the decoded image in RGB color space.
  • Apart from the last convolution layer that outputs a 3-channel image, other layers generate the same number of feature maps.
  • The network is mainly constructed by stacking many dense residual units (DRU).
  • Residual learning contributes to faster convergence.

2.3. CNNIF_S

  • For the simplified CNNIF, the number of DRU is reduced from 8 to 2, and the number of feature maps is reduced from 64 to 32.

3. CNN based Mode Coding (CNNMC)

CNNMC Network Architecture
  • In conventional JEM codec, a heuristic method to derive the Most Probable Modes (MPMs). This MPM index is used to indicated the MPM intra prediction mode to be coded with fewer bits than other non-MPM modes. It may not be accurate enough.
  • In CNNMC, a CNN based approach is used to determine which modes are to be the MPM.
  • As shown above, it utilizes the above, the left and the above-left 128×128 reconstructed blocks as input.
  • Also, the intra prediction modes of them are also used as input.
  • These intra prediction modes are in 4×4 basis. Thus, they are of the size 32×32 units.
  • The output of the network is a probability distribution P of each unit of all the modes with size of 32×32×67 since there are 67 intra prediction modes in VVC (Versatile Video Coding).
  • The overall probability of a particular mode is summed up according to the CU size and position as shown above. The top K ones are the MPMs.

4. Uncertainty based Resource Allocation (UNRA)

  • Since CNNIF and CNNMC are used, the rate distortion (RD) relationship or the RD model is changed.
  • Thus, the objective of uncertainty based resource allocation (UNRA) is to minimize the total expectation distortion (PSNR) subjective to the rate (coding bitrate/file size) constraint:
  • where Qi means the QP of ith image. Di(Qi) and Ri(Qi) stand for the distortion (MSE) and rate (bpp) of the ith image encoded by Qi, respectively. Pi represents the number of pixels in the ith image and T means the total target bits.
  • A hyperbolic function based R-D model based on [7] and the relationship between the optimal QP for a given  to obtain new R-QP and D-QP models. (Since it seems not so related to CNN, so I do not focus on this..)
  • The idea is to treat the proposed codec as a new codec and find the optimal RD models.

5. Experimental Results

5.1. Workshop and Challenge on Learned Image Compression (CLIC) of CVPR 2018

  • The proposed approaches participate in the Workshop and Challenge on Learned Image Compression (CLIC) of CVPR 2018.
  • This challenge provides two image datasets: Dataset P (“professional”)and Dataset M (“mobile”).
  • About 1633 images are for training, 102 for validation and 286 for test.

The participants are required to submit an encoded file for each test image and the total file size should be less than 0.15 bpp.

The performance of the proposed approaches.
  • BPG: Better Portable Graphics which is file format for coding digital images created in 2014.
  • UN-RA+CNNIF+CNNMC obtains 1dB higher average PSNR than BPG under 0.15bpp constraint.
  • UN-RA+CNNIF_S also obtains 1dB higher average PSNR than BPG.
  • The details could be found on the leaderboard (http://www.compression.cc/leaderboard/).

5.2. RD Curves

RD Curves
  • As seen, large gap meaning large improvement obtained by the proposed approaches.

5.3. BD-Rate

  • For BD-rate, 27.90% to 30.80% average BD-rate reduction compared with BPG.

5.4. Visual Quality

  • As seen above, artifacts can be observed from the images compressed by BPG. (The first row)
  • By proposed approach, the artifacts are removed.

During the days of coronavirus, Challenges of writing 30 and 35 stories again for this month have been accomplished. Let me challenge 40 stories!! This is the 38th story in this month.. Thanks for visiting my story..

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.