# Review — Sun VCIP’20: Fully Neural Network Mode Based Intra Prediction of Variable Block Size (HEVC Intra)

## FCN for Small Blocks, CNN for Large Blocks, Outperforms IPFCN, PNNS and PS-RNN With Smaller Complexity

In this story, **Fully Neural Network Mode Based Intra Prediction of Variable Block Size**, (Sun VCIP’20), by Waseda University, JST, PRESTO, and Zhejiang University, is reviewed. In this paper:

- For small blocks
**4×4**and**8×8**,**fully connected networks**are used, while for large blocks**16×16**and**32×32**,**convolutional neural networks**are exploited. - This is
**the first work to explore a fully neural network modes (NM) based framework**for intra prediction.

This is a paper in **2020 VCIP**. (Sik-Ho Tsang @ Medium)

# Outline

**Fully Connected (FC) Networks for Small Blocks 4×4 and****8×8****Convolutional Neural Networks (CNN) for Large Blocks 16×16 and 32×32****Coding Framework with Fully Neural Network Modes (NM)****Experimental Results**

**1. Fully Connected (FC) Networks for Small Blocks 4×4** and **8×8**

- First,
**the neighboring references blocks are flattened to one-dimension vector**with (4*N*+8)×8 nodes. - By passing through
**four FC layers**,**the one-dimension vector is reshaped to two-dimension***N*×*N*block.

- A baseline heavy model with 512 nodes is trained, and then reduce the number of nodes by half.
- When reducing the number of nodes to 256 and 128, the coding loss is small.
- However, when further reducing the dimension to 64, there is an obvious coding loss that is 0.21dB.
- Thus, the node is selected as 128.

# 2. **Convolutional Neural Networks (CNN) for Large Blocks 16×16 and 32×32**

- To keep the spatial information,
**the above three blocks**and**the left two blocks**are sent to**two separate convolutional paths.** - For each path, the down-sampling is conducted to obtain the latent information, and then flatten to one-dimensional vector.
- Two vectors are concatenated and then pass a FC layer.
- The number of outputs nodes of the FC layer is 1/5 of the input nodes.
- Finally, the vector is reshaped to two-dimension and deconvolved to up-sample to the original block size
*N*×*N*.

- Four and five convolutional layers are used for 16×16 and 32×32, respectively.
- PReLU is used.
- The number of filters F is selected as 16 for 16×16 and 32×32 as a trade-off between coding gain and complexity.

**3. Coding Framework with Fully Neural Network Modes (NM)**

**There are overall 35 NMs, the best NM is selected.**(Thus, there should be 35 networks trained for the each block size.)

The 35 conventional intra modes are abandoned. Thus, authors mentioned that this is

the first work to explore a fully neural network modes (NM) based frameworkfor intra prediction.

- First, several candidate modes are selected by sum of absolute transformed differences (SATD) cost. Eight candidates are picked up for block 4×4 and 8×8, while three candidates are chosen for the other blocks.
- This is similar to the Rough Mode Decision (RMD) in the conventional HEVC intra prediction.
- In addition to the candidate modes selected by SATD, Most Probable Modes (MPMs) are also appended in the candidate mode list.
- (I think that the same strategy as the conventional one is used to derive MPMs.)

- The New York city library isas the training set. Each image is encoded with four QPs (22, 27, 32, 37), the batch size
*M*is 16.

A baseline model based on all the training set is trained.

Then, for each mode, the corresponding samples encoded by that mode is subset from the training set, to form a smaller training set which is dedicated to that mode for fine-tuning.

**MSE**with weight decay is used as loss function:

# 4. Experimental Results

## 4.1. BD-Rate

- HM-16.9 with all intra configuration is used.
- On average, 3.55%, 3.03% and 3.27% Y, U, V BD-rate can be saved compared with the anchor.
**Compared with****IPFCN****[4], a large BD-rate reduction is achieved for all the three channels.**

- When using the proposed model,
**the best coding gain is achieved at Class B and E among all the works:****IPFCN****[4],****PNNS****[5],****PS-RNN****[6].**

## 4.2. RD Comparison

- Bitrates are saved when achieving better PSNR compared with the anchor.
- (But full RD curves are not plotted?)

## 4.3. Computational Complexity

- The time is measured under the CPU platform.
**When using the proposed model, 36× and 174× encoding and decoding complexity is cost.**- Compared with IPFCN [4], 60% encoding and 24% decoding complexity can be reduced.
- Compared with PNNS [5], 29% encoding and 9% decoding complexity can be decreased.
- Compared with PS-RNN [6], 16% decoding complexity can be reduced.

## Reference

[2020 VCIP] [Sun VCIP’20]

Fully Neural Network Mode Based Intra Prediction of Variable Block Size

## Codec Intra Prediction

**JPEG** [MS-ROI] [Baig JVICU’17]**JPEG-HDR** [Han VCIP’20]**HEVC **[Xu VCIP’17] [Song VCIP’17] [Li VCIP’17] [Puri EUSIPCO’17] [IPCNN] [IPFCN] [HybridNN, Li ICIP’18] [Liu MMM’18] [CNNAC] [Li TCSVT’18] [Spatial RNN] [PS-RNN] [AP-CNN] [MIP] [Wang VCIP’19] [IntraNN] [CNNAC TCSVT’19] [CNN-CR] [CNNMC Yokoyama ICCE’20] [PNNS] [CNNCP] [Zhu TMM’20] [Sun VCIP’20] [Zhong ELECGJ’21]**VVC** [CNNIF & CNNMC] [Brand PCS’19] [Bonnineau ICASSP’20] [Santamaria ICMEW’20] [Zhu TMM’20]