Reading: QE-CNN — Quality Enhancement Convolutional Neural Network (Codec Filtering)
In this story, Quality Enhancement Convolutional Neural Network (QE-CNN), by Beihang University, is briefly reviewed. In this paper:
- QE-CNN-I: QE-CNN for I frames is proposed.
- QE-CNN-P: QE-CNN for P/B frames is also proposed.
- Time-constraint QE-CNN is also proposed for real-time scenario.
This is an extension of a paper DS-CNN in 2018 ICME. And it is published in 2019 TCSVT where TCSVT has a high impact factor of 4.046. (Sik-Ho Tsang @ Medium)
Outline
- Introduction to ARCNN
- QE-CNN-I: Network Architecture
- QE-CNN-P: Network Architecture
- Experimental Results
- Results for Time Constraint (Real-Time) Scenario
2. QE-CNN-I: Network Architecture
- As shown above, one more convolution layer is used in QE-CNN-I.
- Thus, this is QE-CNN-I (9–7–3–1–5).
- It is tested that AR-CNN-3, i.e. QE-CNN-I (9–7–3–1–5), with PReLU used, has the largest PSNR gain. Therefore, this architecture is adopted.
3. QE-CNN-P: Network Architecture
- Similar to QE-CNN-I, it uses 9–7–3–1–5 network architecture.
- Different from that, it has an additional path as shown in green color.
- A the end of the network, the outputs of Conv 4 and Conv 8 are concatenated, and are both convolved by Conv 9.
4. Experimental Results
4.1. Objective Quality
- LDP (Low Delay P): means except I frames, other frames are P frames which is encoded using previous frames information.
- As shown above, QE-CNN-I obtains highest PSNR gain for I frames compared to ARCNN, VRCNN and DCAD.
- And QE-CNN-P obtains highest PSNR gain for P frames compared to ARCNN, VRCNN and DCAD.
4.2. Subjective Quality
- 12 non-expert subjects are involved in the test.
- During the test, sequences were displayed at random order. After viewing each sequence, the subjects were asked to rate the subjective score.
- The rating score includes excellent (100–81), good (80–61), fair (60–41), poor (40–21), and bad (20–1).
- Again, QE-CNN outperforms ARCNN and DCAD in terms of DMOS.
4.3. Time Analysis
- The running time of AR-CNN method is 0.70 ms per Coding Tree Unit (CTU) and that of DCAD is 0.64 ms per CTU. In contrast, our QE-CNN-I model requires approximately 1.53 ms per CTU, and QE-CNN-P consumes 3.90 ms per CTU.
- Thus, the performance improvement of our QE-CNN method is at the expense of computational time.
5. Results for Time Constraint (Real-Time) Scenario
- Under time constraint scenario, three options can be chosen for k-th CTU:
- k2: QE-CNN-P (highest complexity)
- k1: QE-CNN-I (moderate)
- k0: or no filtering (lowest)
- Hence, the time constraint equation is formulated:
- where k is the kth CTU, n = 0, 1, 2 to indicate which filter is used as shown above, t is the time needed for that CTU, and N is the total number of CTU within one frame.
- The main idea is that the time consumed by all CTUs should be smaller than or equal to the time constraint T, while the MSE reduction (ΔMSE) for all CTUs should be maximized. (I don’t go into details about this because I just want to focus on CNN. If interested, please read the original paper. The hyperlink is at the bottom.)
- For 60 fps, T = 16.67 ms per each frame. With 600 frames, 10 seconds of video are encoded.
- Under this scenario, time-constraint QE-CNN still can obtain 1.41% to 6.83% BD-rate reduction under the time constraint of 60fps (10 seconds in total).
During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 15th story in this month. 50% progress!! Thanks for visiting my story..
Reference
[2019 TCSVT] [QE-CNN]
Enhancing Quality for HEVC Compressed Videos
Codec Filtering
JPEG: [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC:[Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN] [Liu PCS’19] [QE-CNN]
VVC: [Lu CVPRW’19] [Wang APSIPA ASC’19]