Reading: Liu MMM’18 — CNN-Based DCT Using AutoEncoder for JPEG (JPEG Image Compression)
CNN-Based DCT Using AutoEncoder, Outperforms JPEG With 38.03% BD-Rate Reduction
In this story, CNN-Based DCT-Like Transform for Image Compression (Liu MMM’18), by University of Science and Technology of China, is presented. I read this because I work on video coding research. In this paper:
- CNN-based transform is to replace the conventional transform.
- Transform, quantization and inverse transform are trained jointly.
This is a paper in 2018 MMM. (Sik-Ho Tsang @ Medium)
Outline
- Loss Function
- Network Architecture
- Experimental Results
1. Loss Function
- An original image block B is transformed into coefficients C.
- C is quantized into integers I.
- The quantized coefficients I are inversely transformed to reconstruct the image block ^B.
- To optimize the joint rate-distortion cost, distortion is measured by MSE:
- The rate is measured by using l1-norm of the quantized coefficients:
- Total loss is:
- where λ is the Lagrangian multiplier. Different λ, different coding rate and image quality can be obtained for the image.
- In this paper, quantization is fixed as rounding, only focus on the transform and inverse transform.
2. Network Architecture
2.1. Network Architecture
- The network consists of an encoder and a decoder, which is an autoencoder structure.
- At the encoder, the image block, B, goes through few convolutional layers and one fully connected layer of 1024.
- Then, quantize the 1024-length feature vector. This is I which is required to be encoded. The first coefficient is treated as DC and the remainings are treated as AC so that the conventional entropy coding can be applied to encoder I.
- At the decoder, the quantized feature vector is convolved back as a reconstructed block as shown above, which is ^B.
2.2. Two Variants
- For simplified encoder, there is no convolutions at encoder. This is suitable for the case that a mobile device which has limited computational power.
- For simplified decoder, there is no convolution at decoder. This is suitable for the case that the image is downloaded from the cloud where cloud has much higher computational power.
3. Experimental Results
- UCID which has 1338 natural images is used as training data.
- Grayscale image is considered.
- 24 images in the Kodak image dataset are used for evaluation.
3.1. RD Curves
- With different λ, different coding rate and image quality can be achieved to obtain the RD curve.
- We can see that the proposed approach using 3 variants outperforms JPEG and one SOTA approach, Toderici et al. [7].
- But it is not trivial to outperform JPEG2000 and HEVC Intra.
2.2. BD-Rate
- 38.03% BD-rate reduction is achieved compared with JPEG, which is a very large margin.
- Though [7,8,10] are compared indirectly using Kodak color images here, BD-rate actually is not comparable…
I read this during CNN model training process in office hour. And I have just written it after work today, lol.
This is the 11th story in this month!
Reference
[2018 ICIP] [HybridNN, Li ICIP’18]
A Hybrid Neural Network for Chroma Intra Prediction
Codec Intra Prediction
JPEG [MS-ROI]
HEVC [Xu VCIP’17] [Song VCIP’17] [IPCNN] [IPFCN] [HybridNN, Li ICIP’18] [Liu MMM’18] [CNNAC] [Li TCSVT’18] [Spatial RNN] [PS-RNN] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN] [CNNMC Yokoyama ICCE’20] [PNNS]
VVC [CNNIF & CNNMC] [Brand PCS’19]