# Review — Improving Compression Artifact Reduction via End-to-End Learning of Side Information (JPEG Filtering)

**Using Artifact Descriptors as Additional Side Information, Outperforms ****ARCNN****, ****DnCNN****, ****MemNet**, **EDSR****-baseline*.**

In this story,** Improving Compression Artifact Reduction via End-to-End Learning of Side Information**, Ma VCIP’20, by University of Science and Technology of China, is reviewed. In this paper:

- The
**side information**consists of**artifact descriptors**that are**obtained**by analyzing the original and compressed images in the**encoder**. - In the
**decoder**, the**received descriptors**are used as additional input to a well-designed**conditional post-processing neural network**.

This is a paper in **2020 VCIP**. (Sik-Ho Tsang @ Medium)

# Outline

**Overall Scheme****Artifact Descriptor Extraction****Quantization****Rate Estimation****Artifact Feature Mapping****Conditional Residual Block****Experimental Results**

# 1. Overall Scheme

**Encoder**: In the conditional post-processing pipeline,**the floating-point artifact descriptors are first obtained by analysing the original and compressed images**, with a neural network.- These
**descriptors**will be**quantized**and encoded to bitstream as**side information.** **Decoder**: the received bitstream will be decompressed to**reconstruct the descriptors**, and then will be mapped to artifact features, which will be used as conditions by the**conditional post-processing neural network**, with another neural network.- The
**training loss**similar to [15] is used:

- where
*D*is the mean square error (MSE) between the outputs and the original images, and*R*measures the bit rate of the side information.

# 2. Artifact Descriptor Extraction

- A layer with ”2↓” represents the convolution layer with stride of 2.
- The original image and the compressed image respectively pass through
**three convolutional layers**to**extract features***fori*and*frec*. are then*fori*,*frec*and their differences**stacked together**and input to the subsequent layers.- A
**1×1 convolution**and**softmax**operation are used in the final step to map features to floating-point probability vectors, named as**artifact descriptors**. - The channel dimension of the artifact descriptors is set as 16.

# 3. Quantization

- KL distance is used as it is more suitable for measuring the distance between two probability distributions.
- Specifically,
**every artifact descriptor**extracted from an image needs to be*x*∈*R*¹⁶**quantized to one of the 16 one-hot vectors**.*C*= {*c*1,*c*2, …,*c*16} ∈*B*¹⁶ - In the
**forward pass**,:*x*is quantized as the nearest one-hot vector

- In
**backward pass**, the above equation is**approximated by the soft quantization to ensure the back propagation**:

- where
*σ*= 1 in this paper.

# 4. Rate Estimation

- The idea of the context model in [17] is adopted to accurately estimate the bit rate of the transmitted side information.
- Specifically, one 5 × 5 masked convolution layer and three 1 × 1 convolutional layers are used to implement the context model.
- The outputs have 16 channels and are normalized along channel direction by softmax to obtain the
**probability vectors**.*p*∈*R*¹⁶ - The bit rate can be calculated as follows in the forward pass:

- where
*X*is the set of all quantized artifact descriptors of one image. - ˆ
*x*·*p*ˆ*x*represents vector dot product between vector ˆ*x*and*p*ˆ*x*, and ||*X*|| is the size of*X*.

# 5. Artifact Feature Mapping

- “2↑” and “8↑” represent 2× and 8× up-sampling.
- The convolutional layers identified by “2↑” are implemented by sub-pixel convolutional layers ESPCN [18].
- The output artifact features have the same resolution as the compressed images.

# 6. Conditional Residual Block

- The up-sampling layer of
**EDSR**-baseline neural network in EDSR [12] is removed, which has**16 residual blocks**, and use it as the post-processing neural network**backbone**, and called**EDSR****-baseline***. - For the conditional form, the
**artifact features**are**directly multiplied to the input features**of residual blocks after a 1 × 1 convolutional layers.

# 7. Experimental Results

## 7.1. Complexity Analysis

- From the above table, it can be found that
**artifact descriptor extracting, artifact feature mapping, and the introduced conditional mechanism**only bring**small storage and computation overheads.**

- The method that training multiple models and choosing in encoder, is also compared.
- In the
**encoder**, the proposed method causes**much smaller storage and computation overheads.** - In the
**decoder**, by**increasing the computation complexity slightly**, the proposed method**decreases the storage space significantly.**

## 7.2. Rate Distortion Performance

- A simplified baseline model is also tried, named S-EDSR, by reducing the number of residual blocks in EDSR-baseline* from 16 to 4.
- Accordingly, the two test conditional models are named as
**EDSR****-baseline*+side**and**S-****EDSR****+side**. - On
**LIVE1**dataset, the EDSR-baseline***outperforms****ARCNN****[4],****DnCNN****[5], and****MemNet****.** - On
**DIV2K**validation dataset, when the quality factor of JPEG is set to 10,**the PSNR of output images of S-****EDSR****and****EDSR****-baseline* are both increased by more than 0.9 dB, by using side information.** - When considering the transmission overhead, it can be found that the RD performance is also improved due to the contribution of side information. For example, on
**DIV2K**validation dataset,**the S-****EDSR****+side (with 4 residual blocks) is even comparable with****EDSR****-baseline* (with 16 residual blocks).**

## 7.3. Analysis of Descriptors

- Different colors indicate different descriptors.
- It can be seen that different positions, different descriptors are used.

## Reference

[2020 VCIP] [Ma VCIP’20]

Improving Compression Artifact Reduction via End-to-End Learning of Side Information

## Codec Filtering

**JPEG** [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN] [CAR-DRN] [LIU4K] [Ma VCIP’20]**JPEG-HDR** [Han VCIP’20]**HEVC **[Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [CNNF] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [ARTN] [Double-Input CNN] [CNNIF & CNNMC] [B-DRRN] [Residual-VRN] [Liu PCS’19] [DIA_Net] [RRCNN] [QE-CNN] [Jia TIP’19] [EDCNN] [VRCNN-BN] [MACNN] [Yue VCIP’20] [SEFCNN] [LIU4K]**3D-HEVC **[RSVE+POST]**AVS3 **[Lin PCS’19] [CNNLF]**VVC** [AResNet] [Lu CVPRW’19] [Wang APSIPA ASC’19] [ADCNN] [PRN] [DRCNN] [Zhang ICME’20] [MGNLF] [RCAN+PRN+] [Nasiri VCIP’20]