Reading: ARTN — Artifact Reduction Temporal Network (Codec Filtering)

Temporal-CNN Architecture, Outperforms ARCNN, VRCNN, VSRNet, DS-CNN

4 min readJun 10, 2020

In this story, Artifact Reduction Temporal Network (ARTN), by Seoul National University, and Samsung Electronics Co. Ltd., is briefly presented. I read this because I work on video coding research. In this paper:

Simple motion search, i.e. 3-step search, is used to find similar patches from neighboring frames.
These similar patches with the patch from the current frame are input into the temporal CNN for artifact reduction.

This is a paper in 2018 IEEE ACCESS, which is an open access jounral with high impact factor of 4.098. (Sik-Ho Tsang @ Medium)

Outline

ARTN: Network Architecture
Experimental Results

1. ARTN: Network Architecture

For the deep network that maps an input X to the output F(X|θ), its training is to find the set of parameters θ of the network that makes F(X|θ) as close to as the desired signal Y.
The input is the set of three consecutive frames X(t-1), X(t), and X(t+1), and the desired output denoted as Y(t) is the original (uncompressed) frame.
Three branches of convolution layers are deployed, which are fed with the related patches from X(t-1), X(t), and X(t+1), respectively.
These are named as “temporal branches”, whose outputs are then concatenated and fed to the following “aggregation stage”.

1.1. Temporal Branches

The number of feature maps for the current frame is twice larger than that of the previous or next frame (64 vs. 32) for stressing the current frame.
The patch size is determined as 64×64 with the stride of 48, thus 1/4 of block length/width overlap with the neighboring ones.
The simple three step search (TSS) algorithm is used for finding the closest matching patches.
When there is abrupt change, which causes the failure of ME, the previous (or next) frames’ patches are discarded and replaced by the current frame’s patch when the mean absolute difference between the patches exceeds a certain threshold 25,500.

1.2. Aggregation Stage

**(a) Original Inception Block, (b) Modified Inception Block**

Inception block, modified from GoogLeNet, is used.
The max-pooling of original architecture is removed, instead, a larger 7×7 kernel filter is added. This helps to keep the features that are needed for the JPEG artifacts removal or skin detection.
The network is designed such that the number of parameters is close to ARCNN.

1.3. Merge Stage

The final output frame is merged as the weighted sum of output patches.
As stated previously, 1/4 of horizontal/vertical part of a block overlap with the neighboring ones, Gaussian weights from the block center are defined for calculating the weighted sum.

where the weight W(i,j) is the Euclidean distance from the center to (i,j).

2. Experimental Results

2.1. Dataset Preparation

HM-16.9 is used with the random access main profile with GOP of 8.
QPs of 34, 37, 42, 47 are used. Four models are used for four QPs.
JCT-VC sequences are divided into training set and test set:

2.1. PSNR & SSIM

**PSNR & SSIM on Test Set Using HEVC, H.264/AVC and MPEG-2 Codecs**

Baseline: Patches obtained without three step search.
ARTN: Patches obtained with three step search.
Baseline already obtains high PSNR/SSIM.
ARTN outperforms ARCNN, VRCNN and VSRNet.

Compared with DS-CNN, ARTN also obtains the best performance.

2.2. Running Time

The ARCNN, VRCNN, and our baseline network take the computation time of 0.11, 0.12, and 0.25 seconds respectively for processing an FHD (1920×1080) frame.
In the case of our ARTN, it takes about 2 seconds per frame where most of the computation time is taken for the motion estimation (ME).

2.3. Visual Quality

**(a) H.264/AVC. (b)** **ARCNN. (c) Baseline. (d) ARTN. (e) Ground Truth.**

ARTN obtains much lower blocking artifacts.

There are many visualized results in the paper. Also, there are also results about the flickering effect. Please feel free to visit the paper if interested.
This is the 12th story in this month!

Reference

[2018 ACCESS] [ARTN]
Reduction of Video Compression Artifacts Based on Deep Temporal Networks

Codec Filtering

JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [ARTN] [Double-Input CNN] [CNNIF & CNNMC] [B-DRRN] [Residual-VRN] [Liu PCS’19] [QE-CNN] [EDCNN] [VRCNN-BN] [MACNN]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19]
VVC [Lu CVPRW’19] [Wang APSIPA ASC’19]