Review: Lu CVPRW’19 — Multi-Scale Spatial Priors (Codec Filtering)
Multi-Scale Spatial Priors CNN for Versatile Video Coding (VVC)
In this story, a post processing module using Multi-Scale Spatial Priors for Versatile Video Coding (VVC) standard, by Nanjing University, is briefly reviewed. This is a 2019 CVPRW paper, which attended the Workshop and Challenge on Learned Image Compression (CLIC) in 2019 CVPR. (Sik-Ho Tsang @ Medium)
Outline
- Multi-Scale Spatial Priors
- Some Training Details
- Experimental Results
1. Multi-Scale Spatial Priors
1.1. Convolutional Neural Network (CNN) as Post Processing Module
- First, the input RGB is converted into YUV4:4:4 or YUV4:2:0 for encoding.
- Then, the encoded bitstream is sent to decoder (or end-user).
- After decoding, the decoded YUV is converted back into RGB.
- Finally, RGB is enhanced by the post processing module which is a CNN.
1.2. Multi-Scale Spatial Priors CNN Network Architecture
- Scale-wise convolution kernel sizes are utilized to capture the multi-scale priors spatially.
- 3×3 Conv for 1/16 of the original image, 5×5 Conv for 1/4 of the original image, and 7×7 Conv for the original image.
- Authors claim that such operation can extract features from different scales more precisely.
- And it is a suitable convolutional patch sizes which coincides with the variable-size Coding Unit (CU) in Versatile Video Coding (VVC).
- Four modified Residual Blocks (originated from ResNet) are used at different sizes.
- 256 output channels for each convolutional layer at 1/16 of the original dimension, 128 channels at 1/4 of the original dimension, and 64 channels at the original dimension, are used.
1.3. Loss Function
- MSE is used initially.
- Then L1 norm is to replace MSE for fine-tuning.
2. Some Training Details
- The image restoration network uses the training dataset called DIV2K for training.
- The images compressed by the intra coding filters of the VVC as the inputs and the original images as the labels.
- Several QPs (e.g., 25, 30, 35, etc) are adopted as the variables to fit different segments of bit rate.
- i7-7700K CPU and a NVIDIA Quadro P5000 GPU, Adam optimizer, batch size of 16, etc.
- The network is trained using a transfer learning manner. Models of higher QPs are trained based on parameters from models of lower QPs.
- e.g. network parameters at QP 22 is used to derive network models at QP 27.
3. Experimental Results
- Test Dataset P/M with 330 images totally released by the Computer Vision Lab of ETH Zurich.
- The proposed network achieves 0.3 dB gains at each bit rate point and average 6.5% BD-Rate reduction over default VVC Intra.
- However, ARCNN is almost overlapped with the VVC Intra.
- The proposed network achieves 0.5 dB gains at each bit rate point and corresponding average 12.2% BD-Rate reduction.
- As for the above images, the PSNR gains are achieved by 0.2 dB, 0.2 dB, 0.25 dB and 0.15 dB, respectively.
- The BD-Rate has been respectively reduced by 4.35%, 4.03%, 4.56% and 2.99% against VVC with YUV 4:4:4 input.
- The corresponding challenge leaderboard is at: http://challenge.compression.cc/leaderboard/lowrate/valid/
- It seems that there are many SOTA approach afterwards. Thus, I cannot find their approach (Team name: NJUVisionPSNR) in the leaderboard already.
Reference
[2019 CVPRW] [Lu CVPRW’19]
Learned Image Restoration for VVC Intra Coding
My Previous Reviews
Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]
Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]
Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN] [DeepLabv3+]
Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet] [Cascaded 3D U-Net] [Attention U-Net] [RU-Net & R2U-Net] [VoxResNet] [DenseVoxNet][UNet++] [H-DenseUNet] [DUNet]
Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]
Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet] [SR+STN]
Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM] [FCGN] [IEF]
Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17] [VRCNN] [DCAD] [DS-CNN] [Lu CVPRW’19]
Generative Adversarial Network [GAN]