Review: CNNAC — CNN-based Arithmetic Coding for DC Coefficients (HEVC Intra Coding)
DenseNet-like Network, Average 22.47% Bits Saving for DC Coefficients, Average 1.6% BD-Rate Reduction
In this story, CNN-based Arithmetic Coding (CNNAC) for DC Coefficients, by University of Science and Technology of China, is reviewed. The paper use the DenseNet-like network and talks about the DCT process and related to DC coefficients. I read this because I work on video coding research. This is a paper in 2018 ICIP. (Sik-Ho Tsang @ Medium).
- DC Coefficients
- Network Architecture
- Experimental Results
1. DC Coefficients
1.1. DC & AC Coefficients
- In HEVC, DC coefficients in intra-predicted residues are encoded as a part of the entire transform coefficient coding scheme.
- DC Coefficient is at top left corner after DCT. DC Coefficient usually has a large value to be coded. And the remaining entries are AC coefficients usually has relatively small values, which representing high frequency components of the image block as they towards to the bottom right corner.
- Specifically, for a transform unit (TU), a flag is first coded that indicates whether there is non-zero coefficient in the quantized TU.
- If the flag is true, the last non-zero coefficient position, the locations of non-zeros, the coefficient levels and signs are successively encoded.
- Otherwise, the encoding of TU is finished since there is no non-zero coefficients.
1.2. Encoding of DC Coefficients
- The syntax elements for recording the DC coefficient are composed of:
- significant_coeff_flag: whether the DC coefficient is zero.
- coeff_abs_level_greater1_flag: whether the absolute DC coefficient value is larger than 1.
- coeff_abs_level_greater2_flag: whether the absolute DC coefficient value is larger than 2.
- coeff_abs_level_remaining: the absolute value minus 2, and
- coeff_sign_flag: the sign of the DC coefficient.
- significant_coeff_flag, coeff_abs_level_greater1_flag and coeff_abs_level_greater2_flag are encoded with regular mode.
- coeff_abs_level_remaining and coeff_sign_flag are encoded with bypass mode.
- (If interested, please read Section 1 in Song VCIP’17 for regular mode and bypass mode.)
2. Network Architecture
- Instead of coding so many syntax elements as mentioned in Section 1, CNN is used to predict the probability distribution of DC coefficient.
- Then, the DC coefficient value together with the estimated probability is fed into a multi-level arithmetic codec to fulfill entropy coding.
- This approach is similar to Song VCIP’17. But Song VCIP’17 is to encode the intra prediction mode, whereas here, CNNAC is to encode the DC coefficient of each TU.
- With the use of dense blocks, DenseNet-like network is used.
- Between dense blocks, the transition layer is used to down-sample the feature maps.
- At the end of the last dense block, a softmax layer is attached to predict the probability distribution of every candidate.
2.3. Synthesized Images
- In video coding, there is a quantization parameter (QP) to control the bitrate. Higher QP, lower bitrate, or vice versa.
- With different quantization parameters, there are different ranges of values for DC coefficients. Before the real coding of DC coefficient using CNNAC, two synthetic images at different QPs to calculate the minimal and maximal possible values of DC. Accordingly, the softmax layer in the CNN should be corresponding to the range of possible DC values.
- The two synthetic images are composed of white color and black color values as shown above.
2.4. Training Data
- Uncompressed Color Image Database (UCID) and DIV2K are used to prepare the training data.
- Specifically, 40 DIV2K images and 120 UCID images are compressed to generate training data.
- Then 1,000,000 8×8 blocks are used as training data, and 50,000 blocks as validation data, both are randomly selected for different QPs.
- The above network is only used for 8×8 TUs.
3. Experimental Results
- 22.47% bits saving for DC coefficients are obtained by CNNAC compared to the conventional HEVC HM-12.0.
- Average 1.6% BD-rate reduction is achieved.
- R-D performance is better at lower bit rates.
- At lower bit rates, the percentage of bits cost on DC coefficients among all the syntax elements is more, and thus the R-D performance is better at lower bit rates.
During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 21st story in this month. Thanks for visiting my story…