Review — Improving Compression Artifact Reduction via End-to-End Learning of Side Information (JPEG Filtering)

Using Artifact Descriptors as Additional Side Information, Outperforms ARCNN, DnCNN, MemNet, **EDSR-baseline*.**

6 min readJul 4, 2021

In this story, Improving Compression Artifact Reduction via End-to-End Learning of Side Information, Ma VCIP’20, by University of Science and Technology of China, is reviewed. In this paper:

The side information consists of artifact descriptors that are obtained by analyzing the original and compressed images in the encoder.
In the decoder, the received descriptors are used as additional input to a well-designed conditional post-processing neural network.

This is a paper in 2020 VCIP. (Sik-Ho Tsang @ Medium)

Outline

Overall Scheme
Artifact Descriptor Extraction
Quantization
Rate Estimation
Artifact Feature Mapping
Conditional Residual Block
Experimental Results

1. Overall Scheme

Encoder: In the conditional post-processing pipeline, the floating-point artifact descriptors are first obtained by analysing the original and compressed images, with a neural network.
These descriptors will be quantized and encoded to bitstream as side information.
Decoder: the received bitstream will be decompressed to reconstruct the descriptors, and then will be mapped to artifact features, which will be used as conditions by the conditional post-processing neural network, with another neural network.
The training loss similar to [15] is used:

where D is the mean square error (MSE) between the outputs and the original images, and R measures the bit rate of the side information.

2. Artifact Descriptor Extraction

A layer with ”2↓” represents the convolution layer with stride of 2.
The original image and the compressed image respectively pass through three convolutional layers to extract features fori and frec.
fori, frec and their differences are then stacked together and input to the subsequent layers.
A 1×1 convolution and softmax operation are used in the final step to map features to floating-point probability vectors, named as artifact descriptors.
The channel dimension of the artifact descriptors is set as 16.

3. Quantization

KL distance is used as it is more suitable for measuring the distance between two probability distributions.
Specifically, every artifact descriptor x ∈ R¹⁶ extracted from an image needs to be quantized to one of the 16 one-hot vectors C = {c1, c2, …, c16} ∈ B¹⁶.
In the forward pass, x is quantized as the nearest one-hot vector:

In backward pass, the above equation is approximated by the soft quantization to ensure the back propagation:

where σ = 1 in this paper.

4. Rate Estimation

The idea of the context model in [17] is adopted to accurately estimate the bit rate of the transmitted side information.
Specifically, one 5 × 5 masked convolution layer and three 1 × 1 convolutional layers are used to implement the context model.
The outputs have 16 channels and are normalized along channel direction by softmax to obtain the probability vectors p ∈ R¹⁶.
The bit rate can be calculated as follows in the forward pass:

where X is the set of all quantized artifact descriptors of one image.
ˆx · pˆx represents vector dot product between vector ˆx and pˆx, and ||X|| is the size of X.

5. Artifact Feature Mapping

**Artifact feature mapping neural network**

“2↑” and “8↑” represent 2× and 8× up-sampling.
The convolutional layers identified by “2↑” are implemented by sub-pixel convolutional layers ESPCN [18].
The output artifact features have the same resolution as the compressed images.

6. Conditional Residual Block

**The normal residual block suggested in** **ResNet, and conditional residual block in this paper**

The up-sampling layer of EDSR-baseline neural network in EDSR [12] is removed, which has 16 residual blocks, and use it as the post-processing neural network backbone, and called EDSR-baseline*.
For the conditional form, the artifact features are directly multiplied to the input features of residual blocks after a 1 × 1 convolutional layers.

7. Experimental Results

7.1. Complexity Analysis

From the above table, it can be found that artifact descriptor extracting, artifact feature mapping, and the introduced conditional mechanism only bring small storage and computation overheads.

**Complexity Analysis with N Models Trained**

The method that training multiple models and choosing in encoder, is also compared.
In the encoder, the proposed method causes much smaller storage and computation overheads.
In the decoder, by increasing the computation complexity slightly, the proposed method decreases the storage space significantly.

7.2. Rate Distortion Performance

**RD-curves on DIV2K validation dataset.**

A simplified baseline model is also tried, named S-EDSR, by reducing the number of residual blocks in EDSR-baseline* from 16 to 4.
Accordingly, the two test conditional models are named as EDSR-baseline*+side and S-EDSR+side.
On LIVE1 dataset, the EDSR-baseline* outperforms ARCNN [4], DnCNN [5], and MemNet.
On DIV2K validation dataset, when the quality factor of JPEG is set to 10, the PSNR of output images of S-EDSR and EDSR-baseline* are both increased by more than 0.9 dB, by using side information.
When considering the transmission overhead, it can be found that the RD performance is also improved due to the contribution of side information. For example, on DIV2K validation dataset, the S-EDSR+side (with 4 residual blocks) is even comparable with EDSR-baseline* (with 16 residual blocks).

7.3. Analysis of Descriptors

Different colors indicate different descriptors.
It can be seen that different positions, different descriptors are used.

Reference

[2020 VCIP] [Ma VCIP’20]
Improving Compression Artifact Reduction via End-to-End Learning of Side Information

Codec Filtering

JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN] [CAR-DRN] [LIU4K] [Ma VCIP’20]
JPEG-HDR [Han VCIP’20]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [CNNF] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [ARTN] [Double-Input CNN] [CNNIF & CNNMC] [B-DRRN] [Residual-VRN] [Liu PCS’19] [DIA_Net] [RRCNN] [QE-CNN] [Jia TIP’19] [EDCNN] [VRCNN-BN] [MACNN] [Yue VCIP’20] [SEFCNN] [LIU4K]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19] [CNNLF]
VVC [AResNet] [Lu CVPRW’19] [Wang APSIPA ASC’19] [ADCNN] [PRN] [DRCNN] [Zhang ICME’20] [MGNLF] [RCAN+PRN+] [Nasiri VCIP’20]

Review — Improving Compression Artifact Reduction via End-to-End Learning of Side Information (JPEG Filtering)

Using Artifact Descriptors as Additional Side Information, Outperforms ARCNN, DnCNN, MemNet, **EDSR-baseline*.**

Outline

1. Overall Scheme

2. Artifact Descriptor Extraction

3. Quantization

4. Rate Estimation

5. Artifact Feature Mapping

6. Conditional Residual Block

7. Experimental Results

7.1. Complexity Analysis

7.2. Rate Distortion Performance

7.3. Analysis of Descriptors

Reference

Codec Filtering

My Other Previous Paper Readings

Written by Sik-Ho Tsang

No responses yet

Review — Improving Compression Artifact Reduction via End-to-End Learning of Side Information (JPEG Filtering)

Using Artifact Descriptors as Additional Side Information, Outperforms ARCNN, DnCNN, MemNet, EDSR-baseline*.

Outline

1. Overall Scheme

2. Artifact Descriptor Extraction

3. Quantization

4. Rate Estimation

5. Artifact Feature Mapping

6. Conditional Residual Block

7. Experimental Results

7.1. Complexity Analysis

7.2. Rate Distortion Performance

7.3. Analysis of Descriptors

Reference

Codec Filtering

My Other Previous Paper Readings

Written by Sik-Ho Tsang

No responses yet

Using Artifact Descriptors as Additional Side Information, Outperforms ARCNN, DnCNN, MemNet, **EDSR-baseline*.**