Reading: CNN-CR — CNN for Image Compact-Resolution (HEVC Intra)
VDSR-Like Network, Outperforms EDSR & Li TCSVT’18
In this story, Learning a Convolutional Neural Network for Image Compact-Resolution (CNN-CR), by University of Science and Technology of China, and University of Missouri-Kansas City, is presented. I read this because I work on video coding research. In this paper:
- Image CR provides a low-resolution version of a high-resolution image.
- Two applications of image CR can be realized, i.e., low-bit-rate image compression and image retargeting.
- Image/video compression can encode the CR image instead of the full resolution image to save the bitrate.
- Image retargeting can retarget the CR image to different display devices with higher visual quality.
This is a paper in 2019 TIP where TIP has a high impact factor of 6.79. (Sik-Ho Tsang @ Medium)
Outline
- CNN-CR: Loss Function
- CNN-CR: Network Architecture
- Separate Training & Joint Training & Progressive Training
- Application Realizations
- Experimental Results
- Results for Image Retargeting
- Results for Image/Video Compression
1. CNN-CR: Loss Function
- There is no “ground-truth” for the compact-resolved image, instead two loss functions are defined.
- Reconstruction Loss: Denote the original image as x, the mapping function of image CR as f and the mapping function of up-scaling as g, then the reconstruction loss is defined as:
- Thus, f and g are learned jointly, i.e. joint learning.
- Regularization Loss: It is used to ensure the visual quality of the compact-resolved image. The low-resolution image generated by F should be smooth and have no aliasing.
- Bicubic down-sampling is used to approximate the function F.
- Combined Loss:
- where λ is a parameter that controls the relative weight of the regularization loss.
- λ is set as 0.7, which achieves better tradeoff between the visual quality of the compact-resolved image and the final reconstruction quality.
With the above loss function, image CR can generate a low-resolution image which can better preserve high frequency components of the original CR, so that when the CR image is upsampled, higher quality is obtained.
2. CNN-CR: Network Architecture
- CNN-CR consists of several convolutional layers, all of which except the first and the last are of the same configuration: 64 filters with kernel size 3×3, followed by ReLU.
- The first layer operates on the input image and serves as a resolution decreasing layer. For example, in the case of 2× down-sizing, the filters in the first layer will be equipped with stride = 2.
- The last layer is used for generating the compact-resolved image, thus contains a single filter with kernel size 3×3.
- It is similar to VDSR but with downsampling.
- After trying different network depths, 10 layers are selected.
- And downsizing using convolution with stride of 2 is used instead of pooling due to better performance as shown above.
- Residual learning is used which can have better training and faster convergence.
3. Separate Training & Joint Training
3.1. Separate Training
- Separate training is used where up-scaling is bilinear. Only CNN-CR is trained:
3.2. Joint Training
- EDSR is used as CNN-SR. Therefore both CNN-CR and CNN-SR can be trained jointly:
3.3. Progressive Training
- First, CNN-SR is trained using Separate Training.
- Then, by fixing the parameters of CNN-SR, CNN-CR is trained.
- Finally, the entire end-to-end network is fine-tuned.
- With progressive training, the performance is better than direct training, i.e. training the network end-to-end at the very beginning.
4. Application Realizations
4.1. Image Retargeting
- Retargeting in general refers to the task of changing resolution to suit for different display devices.
- The only issue is how to provide arbitrary resolution in CNN-CR, which can be solved by replacing the first layer of CNN-CR with a differentiable re-sampling layer.
4.2. Image/Video Compression
- (a) Frame-level: Whole frame is downsized by CNN-CR and encoded. Then it is super-resolved by CNN-SR.
- (b) Block-level: CTU is used as basis.
- First, each CTU can be either down-sampled and coded, or directly coded at native resolution.
- Second, if coded at low resolution, either CNN-CR or simple down-sampling filter can be used for down-sizing.
5. Experimental Results
- Whole DIV2K dataset is used for training.
5.1. PSNR for CNN-CRSep (Sep means Separate Training)
- CNN-CRSep outperforms bicubic down-sampling, and achieves on average 1.25 dB improvement.
- CNN-CRSep + bicubic up-sampling also performs better than bicubic down-sampling + bicubic up-sampling. This shows that the preserved information introduced by CNN-CRSep can boost the reconstruction quality.
- (d): Bicubic downsampling then bilinear upsampling
- (e): CNN-CNSep then Bilinear upsampling. It has sharper image.
5.2. PSNR for CNN-CRJoint (Joint means Joint Training)
- CNN-CRJoint plus CNN-SR can outperform the EDSR one [4] by a considerable margin.
6. Results for Image Retargeting
- 30 subjects participate.
- 5 discrete levels of scores are given: −2, −1, 0, 1, 2, standing for better, slightly better, indistinguishable, slightly worse, and worse, respectively.
- CNN-CRJoint has higher scores compared to one representative method called Seam Carving.
7. Results for Image/Video Compression
7.1. RD Curves
- HM-12.1 is used.
- Both frame-level and block level approoches work well at low bitrate condition.
- Frame-level one has large PSNR drop at high bitrate while block-level one still can maintain the coding performance at high bitrate due to adaptive mode switching based on rate distortion optimization (RDO).
6.2. BD-Rate
- Frame-level method brings on average 7.0% and 3.1% BD-rate reduction for HEVC and UHD test sequences, respectively.
- Block-level method brings on average 6.9% and 10.4% BD-rate reduction for HEVC and UHD test sequences, respectively.
- Both outperform Li TCSVT’18 [41].
6.3. Hitting Ratios
- Certain amount of CUs choosing the proposed method for encoding.
6.4. Computational Complexity
- Using GPU, the encoding/decoding time is still increased by large amount.
This is the 33rd story in this month!
Reference
[2019 TIP] [CNN-CR]
Learning a Convolutional Neural Network for Image Compact-Resolution
Codec Intra Prediction
JPEG [MS-ROI]
HEVC [Xu VCIP’17] [Song VCIP’17] [Li VCIP’17] [Puri EUSIPCO’17] [IPCNN] [IPFCN] [HybridNN, Li ICIP’18] [Liu MMM’18] [CNNAC] [Li TCSVT’18] [Spatial RNN] [PS-RNN] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN] [CNNAC TCSVT’19] [CNN-CR] [CNNMC Yokoyama ICCE’20] [PNNS]
VVC [CNNIF & CNNMC] [Brand PCS’19]