Reading: CNNCP — CNN-Based Coefficient Predictions (HEVC Intra)

1.8%, 4.1%, and 4.5% BD-Rate Reduction in Y, U, V, respectively, Compared With the HEVC Baseline in All-Intra Configuration

Sik-Ho Tsang
5 min readJun 25, 2020

In this paper, “CNN-Based Coefficient Predictions” (CNNCP), by University of Science and Technology of China, University of Missouri-Kansas City, and New York University, is presented. I read this because I work on video coding research.

  • There exists coefficients’ correlations between current block and neighboring blocks, as these correlations cannot be completely exploited in the intra prediction.
  • In this paper, CNN is used to predict the coefficients using the above correlations as well to boost the coding efficiency.

This is a paper in 2020 DCC. (Sik-Ho Tsang @ Medium)

Outline

  1. HEVC Implementation
  2. CNNCP: Overall Scheme
  3. CNNCP: Network Architecture
  4. Experimental Results

1. HEVC Implementation

HEVC Implementation
  • As shown in the above figure, the newly established part is in red color which is the coefficients prediction.
  • In this paper, CNN is used for coefficients prediction.
  • An additional flag is added at CU level so that each CU can choose whether to use CNN or the conventional approach based on the rate distortion optimization (RDO).

2. CNNCP: Overall Scheme

CNNCP: Overall Scheme
  • The coefficients energies after transform and quantization are mainly concentrated on the lower frequency coefficients.
  • The coefficients are divided into several groups with finer granularity in the lower frequencies and rougher granularity in higher frequencies, and every divided coefficients group can be predicted one time.
  • The coefficients group from lower frequency to higher frequency as C0, C1, …, and Cn−1 for a transform block, the corresponding predicted coefficients with CNN are represented as Cˆ0, Cˆ1, …, and Cnˆ−1.
  • The differences between the original and predicted coefficients are represented as D0, D1, …, and Dn−1, where n is equal to the group number. These differences are entropy coded and are sent to decoder.
  • The below table shows the group details of each Transform Unit (TU) size and each component.
List of Coefficients Division Groups for Different TU Sizes and Different Channels

3. CNNCP: Network Architecture

CNNCP: Network Architecture for TU size 4×4
  • The current block, the neighboring left-top, top, right-top, left, left-down normal reconstructed blocks are concatenated as input. It there aer pixels unavailable, those pixel values are set to 128.
  • For the outputs, they are just the being predicted coefficients values.
  • The network is divided into input layer, middle layer and output layer.
  • Input layer:
  • where BN is batch normalization.
  • Middle layer:
  • where the concept of multiple kernels are used, similar to VRCNN.
  • Output layer:
  • where the pooling layer globally down-samples the feature maps to 1×1 size, i.e. global pooling.
  • The specific network structure for TU size 4×4 is just as presented in the above figure.
  • For TU sizes 8×8, 16×16, and 32×32, 2, 3, and 4 middle layers are cascaded to deal with the gradually larger sizes, with the input and output layers the same as TU size 4×4.
  • L1 norm is used as the loss function:
  • The images in UCID and DIV2K are compressed by HM12.0 with the SDH technology turning off as training data.

4. Experimental Results

4.1. BD-Rate

BD-Rate (%) on Test Data
  • The test data includes 18 8 bit natural sequences which come from the HEVC CTC and 7 sequences which come from UVG1 test sequences.
  • The BD-rate reductions of the proposed method for Y, U, and V channels reach up to 7.1%, 13.0% and 13.0%, and the average BD-rate reduction results for all classes are 1.8%, 4.1% and 4.5%, while the average BD-rate reduction results for Class UVG sequences are even higher, which are 2.9%, 6.5% and 6.6%.

4.2. RD Curves

RD Curves
  • Authors mentioned that there are consistent gain along the curve.
  • But it seems that CNNCP should be more efficient at high bitrate condition than at low bitrate condition since high frequency AC coefficients are fewer at high QPs.

4.3. Visualization

Red Blocks are using CNNCP
  • Blocks which contain regular textures usually use the proposed CNNCP method for intra coding, such as the tree branches in (a) and the posters in (b).
  • In contrast, the blocks which only contain smooth contents do not use the proposed CNNCP method for intra coding, such as the smooth backgrounds.

This is the 34th story in this month!

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet