Review: Xu VCIP’17 — CNN Based Rate Distortion Modeling (HEVC Intra Prediction)

U-Net-Like Network Structure, Model the Rate and Distortion Without Pre-Encoding

Sik-Ho Tsang
4 min readApr 21, 2020

In this story, CNN-Based Rate-Distortion Modeling for H.265/HEVC (Xu VCIP’17), by Wuhan University, is reviewed. CNN is used to predict the rate and distortion without any encoding for rate control purpose. I read this because I work on video coding research. This is a paper in 2017 VCIP. (Sik-Ho Tsang @ Medium)

Outline

  1. Rate Distortion (RD)
  2. Network Architecture
  3. Experimental Results

1. Rate Distortion (RD)

  • To compress/encode a video with different bitrates, a quantization parameter (QP) should be tuned.
  • Lower QP, higher bitrate (rate), higher video quality, i.e. smaller distortion.
  • Higher QP, lower bitrate (rate), lower video quality, i.e. larger distortion.
  • If we can control the the rate and distortion, we can have a constant bitrate for stable streaming/transmission or a constant video quality for viewing.
  • However, without encoding, we do not know the actual rate and distortion.
  • In this paper, CNN is used to predict the rate and distortion without any encoding.

2. Network Architecture

Network Architecture
  • There are 2 paths for the network.
  • Lower (Left) path (Distortion Prediction): is to predict SSIM (Structural SIMilarity) map where higher SSIM, lower distortion, or vice versa.
  • Upper (Right) path (Rate Distortion): is to predict the rate.

2.1. Distortion (D) Prediction

  • The network takes the original images as the input and output the SSIM maps.
  • All the convolutional layers in the network are designed with stride 1×1.
  • Two max pooling layers with size 2×2 and stride 2×2 are constructed in two different stages to extract information better.
  • Corresponding to this, two upsampling layers are added to make compensate for the size reduction after pooling.
  • Skip connections strategy is conducted to aggregate multi-level features, and then a convolutional layer handles all features and determines the size of the output.
  • Thus, the network is similar to U-Net using fully convolutional network (FCN).
  • The loss function is MSE loss of the SSIM map:

2.2. Rate (R) Prediction

  • As the rate information indicating the resource consumption after compression is a scalar, a different network is designed to predict it.
  • Several rate values with different QPs are combined into one fixed-sized vector and it will be used as the output of the network, with the original images as input.
  • The former layers of this network are the same as the left part for predicting SSIM.
  • Several convolutional layers and fully connected layers are added to extract information better.
  • The loss function is:

3. Experimental Results

3.1. SSIM Map Prediction

SSIM Map Prediction
  • The first row indicates the original images in luminance channel.
  • The second row indicates the actual SSIM maps with QP 35.
  • The third row indicates the predicted SSIM maps with QP 35. (All the SSIM maps are squared for visibility.)
  • The predicted SSIM maps are quite similar to the actual ones.

3.2. Prediction Error of SSIM and Rate

Prediction Error of SSIM and Rate
  • The above table shows the prediction error between the predicted ones and the actual ones under different QPs.
  • Most prediction results are acceptable.
SSIM and Rate for Different Images Under QP 34
  • SSIM and rate of different images under QP 34 are shown above. Again, the predicted ones are quite similar to the actual ones.

3.3. SOTA Comparison

SOTA Comparison
  • R-SSIM relationship is built for the proposed approach and compare with [26] and [27].
  • Approaches in [26] and [27] use the actual data of rate and SSIM to revisit, which means multi-pass encoding is necessary. (They need to encode for at least one time for the SSIM and rate.)
  • The proposed CNN approach is much closer to the actual ones.

During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 17th story in this month. Thanks for visiting my story…

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.