Review: Wang APSIPA ASC’19 — CNN-Based Loop Filtering for Versatile Video Coding (Codec Filtering)
Using Squeeze & Excitation and Skip Connection
In this story, a CNN-Based Loop Filtering (LF) for Versatile Video Coding (VVC) using skip connection as well as squeeze & excitation, is briefly reviewed, since I am working on video coding research work. This is a paper in 2019 APSIPA ASC. (Sik-Ho Tsang @ Medium)
Outline
- Squeeze & Excitation (SE) Basic Block
- Network Architecture & Loss Function
- Experimental Results
1. Squeeze & Excitation (SE) Basic Block
- SE Block in SENet is utilized for building the basic block.
- Given a feature map X with shape H×W×C, first, two convolutional layers with Rectified Linear Unit (ReLU) between them:
- Each channel is squeezed to a single numeric value using Global Average Pooling (GAP):
- Then, a fully connected layer followed by a ReLU function adds the necessary nonlinearity. Its output channel complexity is also reduced by a certain ratio r which is set to be 4 in this paper:
- A second fully connected layer followed by a sigmoid activation gives each channel a smooth gating ratio ranged in [0,1]:
- Each channel of Y2 is scaled by the gating ratio of Y5:
- At last, a skip connection (ResNet) will be added from the input into the output directly to learn the residual.
2. Network Architecture & Loss Function
- Two-stage three-branch CNN network is designed.
2.1. First Stage
- At the first stage, the U/V components are upsampled to align the matrix’s sizes since the width and height of U/V component have only half size of that of Y component in YUV4:2:0 format.
- The QPmap is concatenated at this stage.
- QPmap is the feature map which is as the same size of input size, filled with the normalized QP value of the current frame.
2.2. Second Stage
- At the second stage,the main pipeline will be split into three branches.
- Each branch is for one component and fused by its own CUmap.
- CUmap is the feature map with the positions of the boundary are filled by 1 and other positions by 0.5 as shown in the above figure.
2.3. Loss Function
- Since the improvement of the Y component is more important than that of the U/V component, larger weight is assigned to the Y component.
3. Experimental Results
- Anchor: H.266/VVC anchor with DBF, SAO and ALF enabled.
- [26]: Filter located between DBF and SAO.
- [27]: Only replace DBF and SAO, but ALF is enabled.
- [28]: Located between DBF and SAO.
- The proposed approach in this paper is used with DBF, SAO and ALF all disabled. It can obtain 6.46%, 10.40%, 12.79% BD-rate reduction on luma and two chroma components, respectively, which is better than [26–28] which are proposals submitted to the VVC standard.
Reference
[2019 APSIPA ASC] [Wang APSIPA ASC’19]
An Integrated CNN-based Post Processing Filter For Intra Frame in Versatile Video Coding
My Previous Reviews
Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]
Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]
Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN] [DeepLabv3+]
Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet] [Cascaded 3D U-Net] [Attention U-Net] [RU-Net & R2U-Net] [VoxResNet] [DenseVoxNet][UNet++] [H-DenseUNet] [DUNet]
Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]
Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet] [SR+STN]
Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM] [FCGN] [IEF] [Newell ECCV’16 & Newell POCV’16]
Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17] [VRCNN] [DCAD] [DS-CNN] [Lu CVPRW’19] [Wang APSIPA ASC’19]
Generative Adversarial Network [GAN]