Reading: ScratchCNN — Low Complexity Learned Sub-Pixel Motion Compensation (VVC Inter)
Outperforms SRCNN-Like Network, Up to 4.5% BD-Rate Reduction
In this story, Interpreting CNN for Low Complexity Learned Sub-Pixel Motion Compensation in Video Coding (ScratchCNN), by BBC Research and Development, and Dublin City University, is presented. I read this because I work on video coding research. In this paper:
- CNN is used to improve the interpolation of reference samples needed for fractional precision motion compensation.
- Complexity reduction of CNN is achieved by interpreting the interpolation filters learned by the networks.
This is a paper in 2020 ICIP which will be held in October, yet authors put the paper in arXiv. (Sik-Ho Tsang @ Medium)
(For the information of fractional interpolation in video coding, please feel free to read CNNIF.)
Outline
- Proposed ScratchCNN
- Experimental Results
1. Proposed ScratchCNN
1.1. ScratchCNN Network Architecture
- ScratchCNN network architecture is similar to SRCNN one.
- It contains 64 individual 9×9 convolutional kernels in the first layer, 32 individual 1×1 kernels in the second layer, and 32 individual 5×5 kernels in the final layer.
- Residual learning is used.
- One special thing is that ReLU is removed.
- Also, the bias is also removed.
- No padding is applied .
- SAD is used as loss function.
1.2. ScratchCNN Simplication for Complexity Reduction
- With ReLU removed, a non-separable 2D filter M can be obtained:
- Therefore, during inference, only M*X is computed, X no needs to go through 3 layers. As a result, interpolation process is speed up.
1.3. VVC Implementation
- VTM-6.0 is used.
- To support different CU sizes and shapes, 60 networks are trained.
- The selection between the conventional VVC filters and the 13×13 filters is performed at a CU level.
- The filters are only applied on luma samples.
2. Experimental Results
2.1. Ablation Study
- Using ScratchCNN with SAD as loss function and with no padding, BD-rate reduction is obtained.
- Also, the encoding time and decoding time are much less compared to SRCNN.
- Rather than integrating a deep learning software within VTM, all weights and biases (8129 parameters in total) are extracted from each of the 15 trained SRCNNs and implemented in VTM as a series of matrix multiplications.
- In contrast, each trained ScratchCNN model is condensed in one 2D matrix that contains 169 parameters.
2.2. BD-Rate (%)
- Using ScratchCNN, up to 4.54% BD-rate reduction can be obtained.
2.3. Hit Ratio
- 70% to 80% of ScratchCNN usage showing that the ScratchCNN is useful.
This is the 2nd Story in this month.
Reference
[2020 ICIP] [ScratchCNN]
Interpreting CNN for Low Complexity Learned Sub-Pixel Motion Compensation in Video Coding
Codec Inter Prediction
H.264 [DRNFRUC & DRNWCMC]
HEVC [CNNIF] [Zhang VCIP’17] [NNIP] [GVTCNN] [Ibrahim ISM’18] [VC-LAPGAN] [VI-CNN] [CNNMCR] [FRUC+DVRF] [FRUC+DVRF+VECNN] [RSR] [Zhao ISCAS’18 & TCSVT’19] [Ma ISCAS’19] [Xia ISCAS’19] [Zhang ICIP’19] [ES] [GVCNN] [FRCNN] [Pham ACCESS’19] [CNNInvIF / InvIF] [CNN-SR & CNN-UniSR & CNN-BiSR] [DeepFrame] [U+DVPN] [Multi-Scale CNN]
VVC [FRUC+DVRF+VECNN] [ScratchCNN]