Reading: PNNS — Prediction Neural Network Set (HEVC Intra Prediction)
1.46% to 5.20% BD-Rate Reduction, Outperforms IPFCN
In this story, Context-Adaptive Neural Network Based Prediction for Image Compression, (the proposed approach is namely Prediction Neural Network Set (PNNS)), by Sirocco, INRIA Rennes, is briefly presented. In this paper:
- Fully connected neural networks give good performance for small block sizes.
- Convolutional neural networks provide better predictions in large blocks with complex textures.
This is a paper in 2020 TIP where TIP has high impact factor of 6.79. (Sik-Ho Tsang @ Medium)
- Prediction Neural Network Set (PNNS) Formulation
- Fully Connected Network (FCN) Architecture
- Convolutional Neural Network (CNN) Architecture
- HEVC Implementation & Other Details
- Experimental Results
1. Prediction Neural Network Set (PNNS) Formulation
- X be a context containing decoded pixels above and on the left side of a square image block Y of width m.
- The transformation of X into a prediction ˆY of Y via either a fully-connected neural network fm, parametrized by θm, or a convolutional neural network gm, parametreized by Фm.
- The mean pixel intensity α is first computed over all the training images. α is subtracted from each image block to be predicted and its context during training. c stands for center.
2. Fully Connected Network (FCN) Architecture
- It composed of a 4 fully-connected layers.
- First three layers use LeakyReLU with slope of 0.1.
- Last layer has no activation.
3. Convolutional Neural Network (CNN) Architecture
- Xc is split into X0 and X1.
- X0 goes through a stack of convolutional layers and yields Z0.
- X1 goes through a stack of convolutional layers and yields Z1.
- Z0 and Z1 are merged spatially using affine combination.
- Finally, the merged Z goes through a stack of transposed convolutional layers and yields the predicted image block.
- LeakyReLU with slope of 0.1 is used.
4. HEVC Implementation & Other Details
- When the block width m > 8, CNN is used. Otherwise, FCN is used.
- It is found that CNN allows to extract 2D local image structures but less critical for small block sizes.
- For a given m, a model is trained.
- When some pixels are not decoded yet, or not available due to image boundary, etc, masking is used to mask them with learned mean pixel intensity, for better training.
- Random context masking is used during training.
- Euclidean distance is minimized with regularization by L2 norm of the weights:
- “PNNS Switch” is proposed that an additional flag is used to indicate whether the conventional intra mode is used or PNNS is used.
5. Experimental Results
- HM-16.15 is used.
- “PNNS Switch” obtains 3.28% BD-rate reduction, outperforms IPFCN-S.
5.2. Computation Time Ratio
- Similar complexity with IPFCN-S is obtained.
There are LARGE amount of ablation study to choose the correct combination and parameters!! If interested, please feel free to read the paper.
This is the 5th story in this month!