Review: MIP — Multiple Linear Regression Intra Prediction (HEVC Prediction)
0.4% BD-Rate Reduction Compared to the Conventional HEVC HM-16.0
In this story, Multiple linear regression Intra Prediction (MIP), by University of Missouri Kansas City, University of Science and Technology of China, and Tencent America, is reviewed. I read this because I work on video coding research.
Linear regression is the one of the very basic items before learning deep neural networks. In this paper, authors show how they formulate a video coding problem, particularly intra prediction, into a linear regression problem. This is a paper in 2019 ICASSP. (Sik-Ho Tsang @ Medium)
- HEVC Intra Prediction
- Proposed Approach
- Experimental Results
1. HEVC Intra Prediction
- HEVC inherits block-based scheme from previous video coding frameworks with the main difference of Coding Tree Units (CTU) concept.
- HEVC intra coding is based on spatial interpolation of samples from previously decoded image blocks.
- It supports up to 35 intra prediction modes named planar, DC and 33 angular prediction modes as shown above.
- (To know more about video coding and intra prediction, please feel free to read Sections 1 & 2 in IPCNN.)
2. Proposed Approach
- The reference pixels and the best intra prediction, X, will be flattened and concatenated together and fed into the Multiple Linear Regression (MLR) model.
- The ground truth Y is the N×N block.
- Therefore the loss function is:
- Separate model is trained for Planar and DC mode due to their special texture characteristics.
- Since the neighboring angular modes share a lot of similarities, 3 adjacent angular modes are combined into a single MIP angular mode to avoid the singularity:
- where n is HEVC intra prediction mode index and m is MIP mode index. Therefore, there are in total 13 MIP modes.
(The network is quite similar to IPFCN. But here only 1 layer is used with multiple models. In IPFCN, one model is used with more than 1 hidden layers.)
- An additional bit flag is encoded to indicate whether MIP is selected.
- Therefore, for each block, either conventional intra prediction mode or proposed MIP can be selected according to Rate Distortion (RD) cost.
3. Experimental Results
3.1. Settings
- HM-16.0 is used.
- All-intra configuration is adopted.
- Quantization parameter (QPs): 22, 27, 32, 37. Higher QP, lower bitrate and quality, and vice versa.
- DIV2K 2K resolution high-quality image dataset is used to generate the training dataset. DIV2K contains 800 training images, 100 validation images and 100 testing images covering a wide range of contents.
- A separate set of MIP models for each QP and block size combination.
- And there are 13 transformations in each set.
- Therefore, there are in total 4×4×13 = 208 MIP models.
3.2. Results
- An average of -0.4%, -0.6%, and -0.8% BD-Rate saving is achieved for Luma and two Chroma components respectively by the proposed method.
- As high as -0.9% BD-Rate reduction is observed on Traffic from Class A while only -0.2% BD-Rate saving is achieved on Class C.
During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 27th story in this month. Thanks for visiting my story…
Reference
[2019 ICASSP] [MIP]
Multiple Linear Regression for High Efficiency Video Intra Coding
Codec Prediction
HEVC Intra: [CNNIF] [Xu VCIP’17] [Song VCIP’17] [IPCNN] [IPFCN] [CNNAC] [Li TCSVT’18] [AP-CNN] [MIP]
HEVC Inter: [Zhang VCIP’17] [NNIP]