Reading: CNN-SENet — Fast Depth Intra Coding (Fast 3D-HEVC)

20.9% Encoding Time Reduction Without Any Significant Loss

3 min readJul 30, 2020

**3D-HEVC (DIBR: Depth Image Based Rendering)**

In this story, Fast Depth Inra Coding based on Layer-classification and CNN for 3D-HEVC (CNN-SENet), by Beijing University of Technology, is presented. I read this because I work on video coding research. This paper only got 1 page, that means there is not much details. In this paper:

A convolutional neural network (CNN) scheme based on layer-classification for fast depth intra coding is designed to determine the smoothest depth map.
Then, a CNN network incorporating SENet (CNN-SENet) structure is designed and trained.
Finally, the layer-classification model and the CNN-SENet network are combined to predict the coding unit (CU) partition of all coding units (CUs) for depth map.

This is a paper in 2020 DCC. (Sik-Ho Tsang @ Medium)

Outline

There are 3 modules as mentioned in the paper.

Module 1: Layer-Classification Model
Module 2: Network Incorporating SENet
Module 3: CTU Partition Decision Unit

1. Module 1: Layer-Classification Model

The input is a 64×64 pixels block which is preprocessed by mean removal and down-sampling.
The first hidden layer (C1 layer) is a convolution layer with 16 feature maps.
The second hidden layer (C2 layer) is a convolution layer with 24 feature maps of 8×8 size.
The third hidden layer (C3 layer) is a convolution layer with 32 feature maps of 4×4 size.
Then the last two hidden layers successively perform fully connection.
When training the CNN, features after the two fully connected layers are randomly dropped out with probabilities of 50% and 20%, respectively.
Last, the output exhibit 16 prediction probabilities for further CTU partition decision in Module 3.

2. Module 2: Network Incorporating SENet

Module 2 represents the network structure of the SENet.
The branch of C3 represents the SENet structure.
In SENet, do global average pooling to C3, called it a Squeeze process.
After that, the output will go through two fully connected layers, referred to as excitation process.
Finally, sigmoid is used to limit the output to the range of [0,1]. And this value is multiplied as scale to 32 channels of C3 as the input data of the next level.
The SENet can enhance the important features and weaken the unimportant features by controlling the scale. It can make the extracted features more directivity.
(Please feel free to read SENet if interested.)

3. Module 3: CTU Partition Decision Unit

Module 3 is a CTU partition decision unit, which will decide on category and the 16 CU’s prediction probability.
First, the 16 outputs from Module 1 are recorded in the matrix output.
Then, different calculation methods are used to calculate the partition probability of different CU. (But there is no details about the calculation methods.)
The experimental results show that the proposed method can reduce 20.9% encoding time without any significant loss for the 3D video quality.

This is the 27th story in this month.

Reference

[2020 DCC] [CNN-SENet]
Fast Depth Inra Coding based on Layer-classification and CNN for 3D-HEVC

Codec Fast Prediction

H.264 to HEVC [Wei VCIP’17] [H-LSTM]
HEVC [Yu ICIP’15 / Liu ISCAS’16 / Liu TIP’16] [Laude PCS’16] [Li ICME’17] [Katayama ICICT’18] [Chang DCC’18] [ETH-CNN & ETH-LSTM] [Zhang RCAR’19] [Kim TCVST’19] [LFHI & LFSD & LFMD Using AK-CNN] [Yang AICAS’20] [H-FCN]
3D-HEVC [AQ-CNN] [CNN-SENet]
VP9 [H-FCN]
VVC [Jin VCIP’17] [Jin PCM’17] [Jin ACCESS’18] [Wang ICIP’18] [Galpin DCC’19] [Pooling-Variable CNN] [Amna JRTIP’20] [DeepQTMT]