Reading: H-LSTM — Hierarchical LSTM for Fast H.264 to HEVC Transcoding (Fast Codec Prediction)

59.60% Time Reduction With Only 1.158% Increase in BD-BR (BD-Rate)

6 min readMay 26, 2020

In this story, “Fast H.264 to HEVC Transcoding: A Deep Learning Method” (H-LSTM), by Beihang University, is presented. I read this because I work on video coding research. This paper extended its conference paper (i.e. Wei VCIP’17) that I presented last time:

First, a large-scale H.264 to HEVC transcoding database is built.
Second, the correlation between the HEVC CTU partition and H.264 features, and both temporal and spatial-temporal similarities of the CTU partition across video frames, are analyzed.
Third, a deep learning architecture of a hierarchical long short-term memory (H-LSTM) network is proposed to predict the CTU partition of HEVC.

This is a paper in 2019 TCSVT where TCSVT has a high impact factor of 4.046. And I will mainly describe the new stuffs against Wei VCIP’17 in this story. So, it is better to read Wei VCIP’17 first. (Sik-Ho Tsang @ Medium)

It is also a TMM featured article in the month of July 2019!!!
https://signalprocessingsociety.org/publications-resources/ieee-transactions-multimedia/fast-h264-hevc-transcoding-deep-learning-method

Outline

A Large-sScale H.264 to HEVC transcoding (HHT) Database
H.264 Feature Analysis
Proposed H-LSTM
Experimental Results

1. A Large-sScale H.264 to HEVC transcoding (HHT) Database

The HHT database contains the CTU partition data of 93 raw sequences compressed by inter-mode HEVC at Quantization Parameter (QP) = 22, 27, 32 and 37.
The above table shows that the resolutions of those raw video sequences are diverse, ranging from 352×240 to 2048×1080.
There are in total 33,042 frames in the HHT database.
The raw video sequences were encoded by the H.264 reference software JM 19.0 with the default configuration file of encoder_baseline.cfg at four QPs = {22, 27, 32, 37}. As a result, 372 compressed H.264 video streams were obtained.
Subsequently, all 372 H.264 video streams were decoded.
Four common features of H.264, including MV, residual, MB partition and bit allocation, were extracted from the H.264 video streams for the database.
The decoded streams were encoded by the HEVC reference software HM 16.0.
HEVC encoding is with the default configuration file encoder_low_delay_P_main.cfg at QP = {22, 27, 32 and 37}.
In HEVC encoding, the HEVC features of CU, PU and TU partitions were obtained for the database, viewed as the groundtruth.

Finally, HHT database contains a total of 268,640,788 CU samples, including 36.09% splitting samples and 63.91% non-splitting samples.

2. H.264 Feature Analysis

2.1. Overall Correlation Coefficient (CC)

**CC between H.264 features and CTU partition**

The statistical values of correlation coefficient (CC) between H.264 features and CTU partition can be found in the above figure.
Baseline means the CC between H.264 features and randomly generated HEVC CTU partition pattern.
H.264 features are much higher than baseline.
Also, the CC values of 64×64 and 32×32 CUs are larger than those of 16×16 CUs.

The blocks that have the same CTU partition as the previous reference frame are drawn in blue.

2.2. Temporal Similarity

**The CC values of the CTU partition between two frames at various distances**

The CC values of the CTU partition between two frames at various distances are obtained, ranging from 1 group of pictures (GOP) to 25 GOPs.
The CC values are obtained from the co-located units from two frames.
There exists similarity correlation of CTU partition across HEVC video frames and the correlation decays alongside the increased distances of two frames. Thus, the CTU partition of HEVC in previous frames can be applied to predict CTU partition.

2.3. Spatial-Temporal Similarity

The CC values are obtained between the CU at one frame and the eight neighboring CUs at the previous frame.
Similarly, CC curves for such spatial-temporal similarity along with the increased distance between two frames.
The CC values are all above 0.4 for the first GOP.

3. Proposed H-LSTM

3.1. Overall Architecture

For this part, it is very similar to the conference version except that MV is also used as H.264 features. Thus, the feature vector is different as well.
Also, the figure drawn is much beautiful here.
(For more details, please read Wei VCIP’17.)

3.2. Bi-Threshold Decision Scheme

In the test stage, bi-threshold decision scheme is proposed which is not appeared in Wei VCIP’17.
If the output probability from the network > Threshold 1, split.
If the output probability from the network <Threshold 2, not split.
Otherwise, if the output probability is in between threshold 1 and 2, the conventional full RDO (Rate Distortion Optimization) is performed.
The threshold pairs used for 64×64, 32×32 and 16×16 are [0.35,0.65], [0.3,0.7] and [0.2,0.8] respectively.

4. Experimental Results

4.1. BD-Rate

4.1.1 Low Delay P

H-LSTM obtains 59.60% time reduction with only 1.158% BD-BR (BD-rate) increase.

4.1.2 Random Access

BD-BR (BD-rate) of 1.528% is obtained which is lowest.
55.40% average time reduction is obtained which is largest.

4.2. DMOS

15 non-expert subjects.
The average DMOS by H-LSTM (ours) is close to the original transcoder.

4.3. Prediction Accuracy

Without bi-threshold, 82.5% accuracy is obtained.
With bi-threshold, even higher of 91.7% accuracy is obtained.

4.4. Time Analysis

The running time of the H-LSTM model is less than 2% of the original transcoding time.
The H-LSTM model consumes 0.54% and 0.53% of the original transcoding time for 2560×1600 and 1920×1080, respectively.

4.5. Contribution of Features

The features of MV, MB partition, bit allocation and residual reduce transcoding complexity by 44.52%, 57.77%, 43.03% and 51.92%, respectively.
Meanwhile, the BDPSNR results of single features are all above −0.10 dB.
Such results show that each feature contributes in H.264 to HEVC.

During the days of coronavirus, Challenges of writing 30 and 35 stories again for this month have been accomplished. Let me challenge 40 stories!! This is the 37th story in this month.. Thanks for visiting my story..

References

[2019 TCSVT] [H-LSTM]
Fast H.264 to HEVC Transcoding: A Deep Learning Method

Codec Fast Prediction

H.264 to HEVC [Wei VCIP’17] [H-LSTM]
HEVC [Yu ICIP’15 / Liu ISCAS’16 / Liu TIP’16] [Laude PCS’16] [Katayama ICICT’18]
VVC [Jin VCIP’17] [Jin PCM’17] [Wang ICIP’18] [Pooling-Variable CNN]