Reading: IntraNN — CNNs for Video Intra Prediction Using Cross-component Adaptation (HEVC Intra)
2% BD-Rate Reduction for Luma and 1.5% BD-Rate Reduction for Chroma
In this story, Convolutional Neural Networks for Video Intra Prediction Using Cross-component Adaptation (IntraNN), by RWTH Aachen University, is briefly described.
A new intra prediction mode is introduced by using CNN, which outperforms fully connected one. With the use of cross-component prediction (CRCO) in the CNN, a new IntraNN mode is introduced, and higher coding efficiency is achieved. This is a paper in 2019 ICASSP. (Sik-Ho Tsang @ Medium)
Outline
- Network Architecture
- Some Training Details
- HEVC Implementation
- Experimental Results
1. Network Architecture
1.1. Architecture Variants
- Authors tried 3 versions of architecture, C0, C1 and C2.
- C0: Only fully connected (FC) layers are used. That means no convolutions.
- C1: Only one 4×4 convolutional layer before FC layers.
- C2: As shown above, two convolutional layers, 3×3 then 2×2, before FC layers. As no padding is used, the number of reference lines limits the possible kernel sizes and combinations.
- Leaky ReLU is used except the last output layer.
- For the luma, it is shown as the upper branch.
- For the chroma, it is shown as the lower branch where two chroma are concatenated before inputting into the network. And it is called cross-component (CRCO) prediction.
- It is quite special that it is a L-shape input.
- It is found that C2 obtains the largest coding gain of 2.01% BD-rate reduction.
2. Some Training Details
- All networks are trained on samples from 104 sequences with varying resolutions. 11 additional videos are used as a validation set.
- Authors have provided supplementary material about the sequences they use: http://www.ient.rwth-aachen.de/cms/icassp2019/
- Horizontal and vertical flipping are applied for data augmentation.
- Channel-wise mean of the reference area is subtracted from both the reference and the prediction area.
- Some samples with low variance are excluded from training.
- Two loss functions are tried: L1 and SATD. SATD is a Hadamard transformed sum of absolute difference. Simply speaking, this is a simplified transform which is commonly used in video coding for preliminarily checking of the the transformed residual SAD.
- It is found that SATD loss outperforms L0 one.
3. HEVC Implementation
- HM-16.9 is used.
- The above CNN-based prediction is treated as a new mode, the 36th intra prediction mode, called IntraNN.
- (To know more about intra prediction, please feel free to read Sections 1 & 2 in IPCNN.)
- The most probable mode list was extended to hold a fourth option by sending an additional bit when the third list position would have been chosen otherwise.
- There are two methods to place the IntraNN in the intra prediction mode candidate list. One is END and one is UP.
- END: It is always put in the last position. Thus, this END signaling version should cause less overhead when the IntraNN mode is not chosen.
- UP: It is placed directly behind the modes used in the neighborhood. This UP version provides lower signaling costs for the IntraNN mode when in use.
4. Experimental Results
- As shown above, in most cases, the UP variant gives slightly better BD-rate gains than the END version especially for the luma channel.
- Cross-component (CRCO) version outperforms the version that not using network-based chroma prediction on every channel and sequence, on average by -0.6% on the luma and by -0.97% and -0.95% on the chroma channels.
During the days of coronavirus, let me have a challenge of writing 30 stories again for this month ..? Is it good? This is the 20th story in this month. 66.67% progress now, 1/3 way to go!! Thanks for visiting my story..
Reference
[2019 ICASSP] [IntraNN]
Convolutional Neural Networks for Video Intra Prediction Using Cross-component Adaptation
Codec Intra Prediction
HEVC [CNNIF] [Xu VCIP’17] [Song VCIP’17] [IPCNN] [IPFCN] [CNNAC] [Li TCSVT’18] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN]
VVC [Brand PCS’19]