Reading: DRCNN — Dark Video Restoration CNN (VVC Codec Filtering)

33.7% BD-Rate Reduction Against VTM-7.0

4 min readJul 29, 2020

***Dark Video (source: El Fuente test sequence, Netflix)***

In this story, A CNN-Based Post-Processing Algorithm for Dark Video (DRCNN), by Shanghai Jiao Tong University, is presented. I read this because I work on video coding research. In this paper:

A multi-scale residue learning dark video restoration CNN (DRCNN) is designed to enhance the quality of the decoded video.
A loss function is designed to preserve the details of dark video.

DRCNN has attended the Grand Challenge in 2020 ICME: GC4- Encoding in the Dark. This is a paper in 2020 ICME. (Sik-Ho Tsang @ Medium)

Outline

DRCNN: Network Architecture
DRCNN: Loss Function
Experimental Results

1. DRCNN: Network Architecture

To balance between model complexity and filter gain, here a moderately deep network structure is proposed, referred to as DRCNN.
There are eighteen layers in DRCNN, including nine combination layers (each combination layer consists of a convolution layer, a PReLU layer, or a convolution layer, a BatchNorm layer and and a PReLU layer). At the end of DRCNN, the output feature map is added with decoded picture, and the result is post-processing picture.
Nine convolution block, four Max Pooling layer, and four Up Sampling layers are linked to map the reconstructed frame to the residual frame non-linearly.
The structure of the model is divided into two parts.
The first half is a pooling layer after every two convolutions. After each pooling layer, higher-dimensional features of the input graph will be extracted.
The second half of the model is to achieve upsampling by deconvolution after every two convolutions, thereby restoring the feature map to the same size as the input.
The network is similar to U-Net or some other networks with encoder-decoder structure.
PReLU is used. Tanh is used at output.
The below table shows the details:

2. DRCNN: Loss Function

The loss function has 3 parts:

The first part is the MSE between the two images, which reflects the PSNR value between the output image and the ground truth.
The second part is the SSIM value between the two images, and the third part is the texture information of the image obtained through the Scharr filter.
Scharr filter is just like Sobel operator, to extract the edge of the image.

Considering that the structural information and texture information in the dark video is relatively weak, in order to ensure that the structural information and texture information in the dark video will not be damaged during the training process, the latter two items are added to the loss function.

3. Experimental Results

**Two models are trained. One for luma, one for chroma.**

Two models are trained. One for luma, one for chroma.

3.1. BD-Rate

Using Y-PSNR, 33.07% BD-rate reduction is achieved.
Using the average PSNR, 36.08% BD-rate reduction is achieved.

3.2. PSNR & VMAF

The average PSNR and VMAF results of our approach (proposal) are higher than the VTM-7.0 baseline for each bitrate.

3.3. RD Curves Using PSNR & VMAF

DRCNN performs particularly well at low bitrate condition.
Especially in the case of bitrate = 70, average-PSNR and VMAF of our approach is 4.2 dB and 25.76 higher than baseline (VTM-7.0), respectively.

This is the 26th story in this month.

Reference

[2020 ICME] [DRCNN]
A CNN-Based Post-Processing Algorithm for Dark Video

Codec Filtering

JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [CNNF] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [ARTN] [Double-Input CNN] [CNNIF & CNNMC] [B-DRRN] [Residual-VRN] [Liu PCS’19] [DIA_Net] [RRCNN] [QE-CNN] [Jia TIP’19] [EDCNN] [VRCNN-BN] [MACNN]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19]
VVC [AResNet] [Lu CVPRW’19] [Wang APSIPA ASC’19] [ADCNN] [DRCNN]