Reading: DRCNN — Dark Video Restoration CNN (VVC Codec Filtering)
33.7% BD-Rate Reduction Against VTM-7.0
In this story, A CNN-Based Post-Processing Algorithm for Dark Video (DRCNN), by Shanghai Jiao Tong University, is presented. I read this because I work on video coding research. In this paper:
- A multi-scale residue learning dark video restoration CNN (DRCNN) is designed to enhance the quality of the decoded video.
- A loss function is designed to preserve the details of dark video.
DRCNN has attended the Grand Challenge in 2020 ICME: GC4- Encoding in the Dark. This is a paper in 2020 ICME. (Sik-Ho Tsang @ Medium)
Outline
- DRCNN: Network Architecture
- DRCNN: Loss Function
- Experimental Results
1. DRCNN: Network Architecture
- To balance between model complexity and filter gain, here a moderately deep network structure is proposed, referred to as DRCNN.
- There are eighteen layers in DRCNN, including nine combination layers (each combination layer consists of a convolution layer, a PReLU layer, or a convolution layer, a BatchNorm layer and and a PReLU layer). At the end of DRCNN, the output feature map is added with decoded picture, and the result is post-processing picture.
- Nine convolution block, four Max Pooling layer, and four Up Sampling layers are linked to map the reconstructed frame to the residual frame non-linearly.
- The structure of the model is divided into two parts.
- The first half is a pooling layer after every two convolutions. After each pooling layer, higher-dimensional features of the input graph will be extracted.
- The second half of the model is to achieve upsampling by deconvolution after every two convolutions, thereby restoring the feature map to the same size as the input.
- The network is similar to U-Net or some other networks with encoder-decoder structure.
- PReLU is used. Tanh is used at output.
- The below table shows the details:
2. DRCNN: Loss Function
- The loss function has 3 parts:
- The first part is the MSE between the two images, which reflects the PSNR value between the output image and the ground truth.
- The second part is the SSIM value between the two images, and the third part is the texture information of the image obtained through the Scharr filter.
- Scharr filter is just like Sobel operator, to extract the edge of the image.
Considering that the structural information and texture information in the dark video is relatively weak, in order to ensure that the structural information and texture information in the dark video will not be damaged during the training process, the latter two items are added to the loss function.
3. Experimental Results
- Two models are trained. One for luma, one for chroma.
3.1. BD-Rate
- Using Y-PSNR, 33.07% BD-rate reduction is achieved.
- Using the average PSNR, 36.08% BD-rate reduction is achieved.
3.2. PSNR & VMAF
- The average PSNR and VMAF results of our approach (proposal) are higher than the VTM-7.0 baseline for each bitrate.
3.3. RD Curves Using PSNR & VMAF
- DRCNN performs particularly well at low bitrate condition.
- Especially in the case of bitrate = 70, average-PSNR and VMAF of our approach is 4.2 dB and 25.76 higher than baseline (VTM-7.0), respectively.
This is the 26th story in this month.
Reference
[2020 ICME] [DRCNN]
A CNN-Based Post-Processing Algorithm for Dark Video
Codec Filtering
JPEG [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC [Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [CNNF] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [ARTN] [Double-Input CNN] [CNNIF & CNNMC] [B-DRRN] [Residual-VRN] [Liu PCS’19] [DIA_Net] [RRCNN] [QE-CNN] [Jia TIP’19] [EDCNN] [VRCNN-BN] [MACNN]
3D-HEVC [RSVE+POST]
AVS3 [Lin PCS’19]
VVC [AResNet] [Lu CVPRW’19] [Wang APSIPA ASC’19] [ADCNN] [DRCNN]