# Review: IPFCN — Intra Prediction Using Fully Connected Network (HEVC Intra Prediction)

## Deep Learning Based Intra Prediction, Outperforms HEVC Intra Prediction

In this paper, **Intra Prediction using Fully Connected Network (IPFCN)**, by Peking University and Microsoft Research Asia (MSRA), is briefly reviewed. I review this because I work on video coding research. First, the proposed IPFCN is published in **2017 ICIP**. Then, authors enhanced IPFCN and proposed IPFCN-S and IPFCN-D. It is published in **2018 TIP** where TIP has a high impact factor of 6.79. (Sik-Ho Tsang @ Medium)

# Outline

**Conventional HEVC Intra Prediction****IPFCN (2017 ICIP)****Enhanced IPFCN (2018 TIP)**

**1. Conventional HEVC Intra Prediction**

## 1.1. HEVC Video Coding

- A video is composed of a sequence of frames. In HEVC, each frame is encoded one by one.
- A frame is divided into non-overlapping blocks, called Coding Tree Units (CTUs). Each CTU has the size of 64×64. CTUs are encoded from top left to bottom right using raster scan order.

- For each CTU, quad-tree coding is applied to divide the CTU into 4 smaller square coding units (CUs), from 64×64, 32×32, 16×16 down to 8×8. By comparing the cost of CUs at each CU level, different sizes of CUs are chosen to encode each CTU.
- (8×8 CUcan be divided into four 4×4 Prediction Units (PUs), but this is not the focus in this story.)
- Each CU is encoded by different approaches, such as intra prediction and inter prediction.
- In this paper, authors focus on intra prediction only.

## 1.2. 35 Intra Predictions in HEVC

- For each CU in intra prediction, there are 35 predictions as shown above.
**Neighbor reference samples are used to predict the current CU.****0: planar**, to predict smooth gradual change within the CU.**1: DC**, using the average value to fill in the CU as prediction.**2–34: Angular**, using different angles to predict the current CU.- Some examples are shown at the right of the figure.

# 2. IPFCN (2017 **ICIP**)

## 2.1. Network Architecture

- The idea of IPFCN is to
**input the**into the neural network, and*L*×*L*+2*N+*2*N*Neighbor reference samples (Orange)**output the**.*N*×*N*predicted samples - The neural network
**composed of fully connected (FC) layers only**, or called multi-layer perceptron (MLP). **PReLU (Parametric Rectified Linear Unit)**is used as activation function.**Mean squared error (MSE) loss function**is used:

## 2.2. Validation

- All IPFCNs are trained from the same training data set which is extracted from Netflix sequences.
- The validation set consists of BasketballDrill, FourPeople, BQSquare, ParkScene, and Traffic.
- To determine
**how many layers (how deep) are used**. validation set is used:

- The 3-layer model can outperform the 2-layer model with a relatively large margin.However, the deeper model cannot further improve the performance.
- The 8-layer model even has performance loss.
- To determine
**how many dimensions are used**. validation set is used:

- Finally,
**128-dimensional IPFCN with 3-layer**is used. - A binary flag will be transmitted to the decoder to indicate whether IPFCN or the conventional HEVC intra prediction is used.

## 2.3. Results

- The proposed IPFCN can achieve an
**average of 1.1% bitrate saving on luma component**. - The maximum one is 3.3% for Tango.
- At the same time, the
**two chroma components both have 1.6% bitrate saving**. **For encoding and decoding time, the proposed method bring additional 48% and 190% cost.**This mainly comes from the forward computation of IPFCN.- The parameters are in float precision, which is not computationally friendly for video coding.

**3. Enhanced IPFCN (2018 TIP)**

## 3.1. Major Differences from IPFCN (2017 **ICIP**)

- In contrast to the conference version, the MSE loss function is used with regularization to reduce overfitting:

- Also, there are two IPFCN variants:
**IPFCN-S**and**IPFCN-D**. **IPFCN-S: Single model**, just like the conference version.**IPFCN-D: Dual models**. The training data is classified into two groups. One group is with the angular directions, namely**modes 2-34, for directional blocks**. The other group is with non-angular directions, namely**DC and planar modes, for homogeneous blocks**. The two groups of training data have different attributes.- By training using two groups of data, more accurate models can be trained. And it is expected to have more bitrate reduction.
- Thus, for IPFCN-D, one more bin is introduced to indicate the dual IPFCN models.
- In HEVC, the size of CU varies from 8×8 to 64×64. Considering that the 64×64 CU is seldom chosen in intra coding, the proposed IPFCN will be not enabled for 64×64 CU.

## 3.2. Validation

**Left**: It is found that**4-layer model**has the lowest loss. 4-layer model is chosen.**Right**: An example for 8**×**8 blocks, the result of 1024-dimensional model gets very close to that of 2048-dimensional model. For this reason,**1024-dimension**is used.**The dimensions are set as 512, 1024, 1024, and 2048 for 4×4, to 32×32 block sizes, respectively.**

## 3.3. Results

**3.3.1 BD-rate**

- On average, IPFCN-D outperforms IPFCN-S for all three Y, U and V components.

- For both models, larger BD-rate reduction can be obtained for large QPs.

**3.3.2 Complexity**

- The test is done with Intel Xeon E7–4870 CPU.
- “
**L**” means**light model**using fewer dimensions within the network.**The dimensions are set as 64, 128, 128, and 256 for 4×4 to 32×32 block sizes respectively.** - The encoding time of IPFCN-S-L is about 3 times the HEVC anchor, and the decoding time is about 8 times.
- At the same time, IPFCN-S-L still achieves 2.3% bitrate reduction on average, and 3.2% bitrate reduction on 4K sequences.

**3.3.3 Some qualitative results**

- As can be observed, the network is clearly capable of producing more accurate prediction when handling these complex blocks.
**Irregular shape can also be predicted.**

**3.3.4. Some Analyses**

- IPFCN-D can reduce 4.7% prediction error for 8×8 block.
- Similarly, the prediction error of 16×16 block decreases by 4.4%.

- The percentages of IPFCN-D in Rollercoaster sequence are 78%, 69%, and 78% for 8 × 8 32 × 32 CUs, respectively.
- The other sequences also have remarkable numbers. These results verify the effectiveness of the proposed network.

# References

[2017 ICIP] [IPFCN]

Intra Prediction Using Fully Connected Network for Video Coding

[2018 TIP] [IPFCN-S/IPFCN-D]

Fully Connected Network-Based Intra Prediction for Image Coding

# Codec Prediction

[IPFCN]