Review: IFCNN — In-Loop Filtering Using Convolutional Neural Network (Codec Filtering)
Average 1.9% — 2.8% gain in BD-rate for Low Delay, Average 1.6% — 2.6% gain in BD-rate for Random Access
In this story, IFCNN, by Korea Advanced Institute of Science and Technology (KAIST), is presented. Convolutional Neural Network (CNN) based in-loop filtering is invented for denoising/deblocking to further increase the coding efficiency. This is a paper in 2016 IVMSP Workshop with more than 50 citations. (Sik-Ho Tsang @ Medium)
Outline
- Proposed IFCNN
- Experimental Results
1. Proposed IFCNN
- Instead of putting at the output of the reconstructed image/video, IFCNN is put inside the encoding loop. That’s why is is called in-loop CNN.
- As shown above, Sample Adaptive Offset (SAO) is replaced by IFCNN.
- One bit signalling is needed for IFCNN on/off.
- IFCNN is a very shallow CNN as shown above.
- W1 is of a size 9×9×64 and B1 is a 64-dimensional vector.
- W2 is of a size 64×3×3×32 and B2 is a 32-dimensional vector.
- W3 is of a size 32×5×5×1 and B3 is an 1-dimensional vector.
- And ReLU is not applied.
- Mean Square Error (MSE) is used as loss function L for training.
2. Experimental Results
- Average 4.8% gain in BD-rate for All Intra configuration.
- Average 1.9% — 2.8% gain in BD-rate for Low Delay P configuration.
- Average 1.6% — 2.6% gain in BD-rate for Random Access configuration.
- When using vl_nnconv() function in MatConvNet, it takes 0.4 seconds per one frame in the 416×240 sequences, and 1.2 seconds per one frame in the 832×480 sequences for the IFCNN structure in a PC with 3 GHz CPU and 32GB RAM.
- Some visualizations:
Reference
[2016 IVMSP Workshop] [IFCNN]
CNN-based In-loop Filtering for Coding Efficiency Improvement
My Previous Reviews
Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]
Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]
Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [LC] [FC-DenseNet] [IDW-CNN] [SDN]
Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet]
Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]
Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet]
Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM]
Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17]