Review: MIP — Multiple Linear Regression Intra Prediction (HEVC Prediction)

0.4% BD-Rate Reduction Compared to the Conventional HEVC HM-16.0

Sik-Ho Tsang
4 min readApr 26, 2020

In this story, Multiple linear regression Intra Prediction (MIP), by University of Missouri Kansas City, University of Science and Technology of China, and Tencent America, is reviewed. I read this because I work on video coding research.

Linear regression is the one of the very basic items before learning deep neural networks. In this paper, authors show how they formulate a video coding problem, particularly intra prediction, into a linear regression problem. This is a paper in 2019 ICASSP. (Sik-Ho Tsang @ Medium)

  1. HEVC Intra Prediction
  2. Proposed Approach
  3. Experimental Results

1. HEVC Intra Prediction

35 HEVC Intra Predictions
35 HEVC Intra Predictions
  • HEVC inherits block-based scheme from previous video coding frameworks with the main difference of Coding Tree Units (CTU) concept.
  • HEVC intra coding is based on spatial interpolation of samples from previously decoded image blocks.
  • It supports up to 35 intra prediction modes named planar, DC and 33 angular prediction modes as shown above.
  • (To know more about video coding and intra prediction, please feel free to read Sections 1 & 2 in IPCNN.)

2. Proposed Approach

Framework of proposed Multiple linear regression Intra Prediction (MIP) scheme
  • The reference pixels and the best intra prediction, X, will be flattened and concatenated together and fed into the Multiple Linear Regression (MLR) model.
  • The ground truth Y is the N×N block.
  • Therefore the loss function is:
  • Separate model is trained for Planar and DC mode due to their special texture characteristics.
  • Since the neighboring angular modes share a lot of similarities, 3 adjacent angular modes are combined into a single MIP angular mode to avoid the singularity:
  • where n is HEVC intra prediction mode index and m is MIP mode index. Therefore, there are in total 13 MIP modes.

(The network is quite similar to IPFCN. But here only 1 layer is used with multiple models. In IPFCN, one model is used with more than 1 hidden layers.)

Proposed MIP in HEVC Encoder
  • An additional bit flag is encoded to indicate whether MIP is selected.
  • Therefore, for each block, either conventional intra prediction mode or proposed MIP can be selected according to Rate Distortion (RD) cost.

3. Experimental Results

3.1. Settings

  • HM-16.0 is used.
  • All-intra configuration is adopted.
  • Quantization parameter (QPs): 22, 27, 32, 37. Higher QP, lower bitrate and quality, and vice versa.
  • DIV2K 2K resolution high-quality image dataset is used to generate the training dataset. DIV2K contains 800 training images, 100 validation images and 100 testing images covering a wide range of contents.
  • A separate set of MIP models for each QP and block size combination.
  • And there are 13 transformations in each set.
  • Therefore, there are in total 4×4×13 = 208 MIP models.

3.2. Results

BD-rate (%) of proposed approach compared with the conventional HEVC HM-16.9
  • An average of -0.4%, -0.6%, and -0.8% BD-Rate saving is achieved for Luma and two Chroma components respectively by the proposed method.
  • As high as -0.9% BD-Rate reduction is observed on Traffic from Class A while only -0.2% BD-Rate saving is achieved on Class C.

During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 27th story in this month. Thanks for visiting my story…

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.