# Review: Xu VCIP’17 — CNN Based Rate Distortion Modeling (HEVC Intra Prediction)

## U-Net-Like Network Structure, Model the Rate and Distortion Without Pre-Encoding

In this story, **CNN-Based Rate-Distortion Modeling for H.265/HEVC (Xu VCIP’17)**, by Wuhan University, is reviewed. **CNN is used to predict the rate and distortion without any encoding for rate control purpose. **I read this because I work on video coding research. This is a paper in **2017 VCIP**. (Sik-Ho Tsang @ Medium)

# Outline

**Rate Distortion (RD)****Network Architecture****Experimental Results**

**1. Rate Distortion (RD)**

- To compress/encode a video with different bitrates, a quantization parameter (QP) should be tuned.
- Lower QP, higher bitrate (rate), higher video quality, i.e. smaller distortion.
- Higher QP, lower bitrate (rate), lower video quality, i.e. larger distortion.
- If we can control the the rate and distortion, we can have a constant bitrate for stable streaming/transmission or a constant video quality for viewing.
- However, without encoding, we do not know the actual rate and distortion.
- In this paper,
**CNN is used to predict the rate and distortion without any encoding.**

**2. Network Architecture**

- There are 2 paths for the network.
**Lower (Left) path (Distortion Prediction)**: is to predict SSIM (Structural SIMilarity) map where higher SSIM, lower distortion, or vice versa.**Upper (Right) path (Rate Distortion**): is to predict the rate.

## 2.1. **Distortion (D) Prediction**

- The network takes the original images as the input and output the SSIM maps.
- All the convolutional layers in the network are designed with stride 1×1.
- Two max pooling layers with size 2×2 and stride 2×2 are constructed in two different stages to extract information better.
- Corresponding to this, two upsampling layers are added to make compensate for the size reduction after pooling.
- Skip connections strategy is conducted to aggregate multi-level features, and then a convolutional layer handles all features and determines the size of the output.
- Thus, the network is similar to U-Net using fully convolutional network (FCN).
- The loss function is MSE loss of the SSIM map:

## 2.2. Rate (R) Prediction

- As the rate information indicating the resource consumption after compression is a scalar, a different network is designed to predict it.
- Several rate values with different QPs are combined into one fixed-sized vector and it will be used as the output of the network, with the original images as input.
- The former layers of this network are the same as the left part for predicting SSIM.
- Several convolutional layers and fully connected layers are added to extract information better.
- The loss function is:

**3. Experimental Results**

## 3.1. SSIM **Map **Prediction

- The first row indicates the original images in luminance channel.
- The second row indicates the actual SSIM maps with QP 35.
- The third row indicates the predicted SSIM maps with QP 35. (All the SSIM maps are squared for visibility.)
- The predicted SSIM maps are quite similar to the actual ones.

## 3.2. Prediction Error of SSIM and Rate

- The above table shows the prediction error between the predicted ones and the actual ones under different QPs.
- Most prediction results are acceptable.

- SSIM and rate of different images under QP 34 are shown above. Again, the predicted ones are quite similar to the actual ones.

## 3.3. SOTA Comparison

- R-SSIM relationship is built for the proposed approach and compare with [26] and [27].
- Approaches in [26] and [27] use the actual data of rate and SSIM to revisit, which means multi-pass encoding is necessary. (They need to encode for at least one time for the SSIM and rate.)
- The proposed CNN approach is much closer to the actual ones.

During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 17th story in this month. Thanks for visiting my story…

# Reference

[2017 VCIP] [Xu VCIP’17]

CNN-Based Rate-Distortion Modeling for H.265/HEVC

# Codec Prediction

[CNNIF] [Xu VCIP’17] [IPCNN] [IPFCN] [NNIP] [Li TCSVT’18]