Reading: Liu MMM’18 — CNN-Based DCT Using AutoEncoder for JPEG (JPEG Image Compression)

CNN-Based DCT Using AutoEncoder, Outperforms JPEG With 38.03% BD-Rate Reduction

4 min readJun 9, 2020

In this story, CNN-Based DCT-Like Transform for Image Compression (Liu MMM’18), by University of Science and Technology of China, is presented. I read this because I work on video coding research. In this paper:

CNN-based transform is to replace the conventional transform.
Transform, quantization and inverse transform are trained jointly.

This is a paper in 2018 MMM. (Sik-Ho Tsang @ Medium)

Outline

Loss Function
Network Architecture
Experimental Results

1. Loss Function

An original image block B is transformed into coefficients C.
C is quantized into integers I.
The quantized coefficients I are inversely transformed to reconstruct the image block ^B.
To optimize the joint rate-distortion cost, distortion is measured by MSE:

The rate is measured by using l1-norm of the quantized coefficients:

Total loss is:

where λ is the Lagrangian multiplier. Different λ, different coding rate and image quality can be obtained for the image.
In this paper, quantization is fixed as rounding, only focus on the transform and inverse transform.

2. Network Architecture

2.1. Network Architecture

The network consists of an encoder and a decoder, which is an autoencoder structure.
At the encoder, the image block, B, goes through few convolutional layers and one fully connected layer of 1024.
Then, quantize the 1024-length feature vector. This is I which is required to be encoded. The first coefficient is treated as DC and the remainings are treated as AC so that the conventional entropy coding can be applied to encoder I.
At the decoder, the quantized feature vector is convolved back as a reconstructed block as shown above, which is ^B.

2.2. Two Variants

For simplified encoder, there is no convolutions at encoder. This is suitable for the case that a mobile device which has limited computational power.
For simplified decoder, there is no convolution at decoder. This is suitable for the case that the image is downloaded from the cloud where cloud has much higher computational power.

3. Experimental Results

UCID which has 1338 natural images is used as training data.
Grayscale image is considered.
24 images in the Kodak image dataset are used for evaluation.

3.1. RD Curves

With different λ, different coding rate and image quality can be achieved to obtain the RD curve.
We can see that the proposed approach using 3 variants outperforms JPEG and one SOTA approach, Toderici et al. [7].
But it is not trivial to outperform JPEG2000 and HEVC Intra.

2.2. BD-Rate

38.03% BD-rate reduction is achieved compared with JPEG, which is a very large margin.

Though [7,8,10] are compared indirectly using Kodak color images here, BD-rate actually is not comparable…

I read this during CNN model training process in office hour. And I have just written it after work today, lol.
This is the 11th story in this month!

Reference

[2018 ICIP] [HybridNN, Li ICIP’18]
A Hybrid Neural Network for Chroma Intra Prediction

Codec Intra Prediction

JPEG [MS-ROI]
HEVC [Xu VCIP’17] [Song VCIP’17] [IPCNN] [IPFCN] [HybridNN, Li ICIP’18] [Liu MMM’18] [CNNAC] [Li TCSVT’18] [Spatial RNN] [PS-RNN] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN] [CNNMC Yokoyama ICCE’20] [PNNS]
VVC [CNNIF & CNNMC] [Brand PCS’19]