Reading: CNN-SENet — Fast Depth Intra Coding (Fast 3D-HEVC)
20.9% Encoding Time Reduction Without Any Significant Loss
In this story, Fast Depth Inra Coding based on Layer-classification and CNN for 3D-HEVC (CNN-SENet), by Beijing University of Technology, is presented. I read this because I work on video coding research. This paper only got 1 page, that means there is not much details. In this paper:
- A convolutional neural network (CNN) scheme based on layer-classification for fast depth intra coding is designed to determine the smoothest depth map.
- Then, a CNN network incorporating SENet (CNN-SENet) structure is designed and trained.
- Finally, the layer-classification model and the CNN-SENet network are combined to predict the coding unit (CU) partition of all coding units (CUs) for depth map.
This is a paper in 2020 DCC. (Sik-Ho Tsang @ Medium)
Outline
- There are 3 modules as mentioned in the paper.
- Module 1: Layer-Classification Model
- Module 2: Network Incorporating SENet
- Module 3: CTU Partition Decision Unit
1. Module 1: Layer-Classification Model
- The input is a 64×64 pixels block which is preprocessed by mean removal and down-sampling.
- The first hidden layer (C1 layer) is a convolution layer with 16 feature maps.
- The second hidden layer (C2 layer) is a convolution layer with 24 feature maps of 8×8 size.
- The third hidden layer (C3 layer) is a convolution layer with 32 feature maps of 4×4 size.
- Then the last two hidden layers successively perform fully connection.
- When training the CNN, features after the two fully connected layers are randomly dropped out with probabilities of 50% and 20%, respectively.
- Last, the output exhibit 16 prediction probabilities for further CTU partition decision in Module 3.
2. Module 2: Network Incorporating SENet
- Module 2 represents the network structure of the SENet.
- The branch of C3 represents the SENet structure.
- In SENet, do global average pooling to C3, called it a Squeeze process.
- After that, the output will go through two fully connected layers, referred to as excitation process.
- Finally, sigmoid is used to limit the output to the range of [0,1]. And this value is multiplied as scale to 32 channels of C3 as the input data of the next level.
- The SENet can enhance the important features and weaken the unimportant features by controlling the scale. It can make the extracted features more directivity.
- (Please feel free to read SENet if interested.)
3. Module 3: CTU Partition Decision Unit
- Module 3 is a CTU partition decision unit, which will decide on category and the 16 CU’s prediction probability.
- First, the 16 outputs from Module 1 are recorded in the matrix output.
- Then, different calculation methods are used to calculate the partition probability of different CU. (But there is no details about the calculation methods.)
- The experimental results show that the proposed method can reduce 20.9% encoding time without any significant loss for the 3D video quality.
This is the 27th story in this month.
Reference
[2020 DCC] [CNN-SENet]
Fast Depth Inra Coding based on Layer-classification and CNN for 3D-HEVC
Codec Fast Prediction
H.264 to HEVC [Wei VCIP’17] [H-LSTM]
HEVC [Yu ICIP’15 / Liu ISCAS’16 / Liu TIP’16] [Laude PCS’16] [Li ICME’17] [Katayama ICICT’18] [Chang DCC’18] [ETH-CNN & ETH-LSTM] [Zhang RCAR’19] [Kim TCVST’19] [LFHI & LFSD & LFMD Using AK-CNN] [Yang AICAS’20] [H-FCN]
3D-HEVC [AQ-CNN] [CNN-SENet]
VP9 [H-FCN]
VVC [Jin VCIP’17] [Jin PCM’17] [Jin ACCESS’18] [Wang ICIP’18] [Galpin DCC’19] [Pooling-Variable CNN] [Amna JRTIP’20] [DeepQTMT]