Reading: Galpin DCC’19 — CNN-Based Driving of Block Partitioning (VVC Intra Prediction)

Using ResNet Blocks, Outperforms Liu ISCAS’16 and ETH-CNN. Speed-Up of ×2 Without BD-rate Loss, or Speed-Up Above ×4 With a Loss Below 1% in BD-rate

4 min readJun 14, 2020

In this story, “CNN-based driving of block partitioning for intra slices encoding” (Galpin DCC’19), by Technicolor, is presented. I read this because I work on video coding research. In this paper:

A CNN is used for fast CU partitioning in JEM, which is a future video coding (FVC) beyond HEVC. And later on, FVC is modified to become VVC.

This is a paper in 2019 DCC. (Sik-Ho Tsang @ Medium)

Outline

Overall Framework
CNN Network Architecture
Probable Split Selection
Experimental Results

1. Overall Framework

1.1. CNN-based analysis

In the first step, each input 65×65 patch is analyzed by a CNN-based texture analyzer. The output consists of a vector of probabilities associated to each elementary boundary that separate elementary sub-blocks:

**Mapping of the boundary locations onto a vector n × 1.**

The above figure illustrates the mapping between elementary boundary locations and the vector of probabilities. The size of elementary blocks being 4 × 4, the vector contains n = 480 probability values.

1.2. Probable split selection

The second step takes as input the probability of each elementary boundary and outputs a first set of splitting modes among all possible options, which are: no split, QT, BT (vertical, horizontal), ABT (top, bottom, left, right).
i.e. based on the probability vector, to decide what kinds of splitting will be performed: no split, QT, BT (vertical, horizontal), ABT (top, bottom, left, right).

1.3. Encoder constraints and speed-ups

The encoder itself has some predefined rules and settings for the splitting.
The third step selects the final set of splitting modes to be checked by classical RDO, depending on the first set provided by step 2, the contextual split constraints and the encoder speed-ups.
(I will not focus on this part.)

2. CNN Network Architecture

The CNN is loosely based on a small ResNet, composed of a set of convolutional layers with several skip connections. The main differences with a small ResNet (i.e. 18-layer version with 2 building blocks) are:
The adaptation of the number of filters (e.g. up to 48 filters instead of 512) and layers (13 layers instead of 18),
The absence of batch norm, average pooling and double fully connected layers,
The pooling at the end of each scaling block,
The absence of stride during convolution,
QStep is normalized between 0 and 1, and input into network as well.
Finally, the model contains ∼ 225k parameters (compared to ∼ 10M for ResNet-18).

DIV2K is used for training, 10M patches, encoded with QPs ranging from 19 to 41.
Batch size of 256 is used.
L2 norm is used as loss function:

where ck is the weight of convolutions to regularize the network.

3. Probable Split Selection

(a): After obtaining the 480-dimensional vector.
(b): It is formed back into the CU.
(c): At each location of possible splits, the average value of elementary boundaries corresponding to each half-split boundary is computed.
These values are then compared to pre-determined thresholds for each split choice. A decision to explore or not each choice is finally made.
(d): provides an example where, at this step, i.e. before performing the third step, QT, BT horizontal, and ABT bottom will be considered by the classical RDO in the example shown above.

4. Experimental Results

JEM7 is used.
The proposed CNN obtains 54% time reduction with 0.005% BD-rate loss only, which outperforms two CNN variants, Liu ISCAS’16 [2] and ETH-CNN [3].
(But it is difficult to have a fair comparison since they are using different versions of encoder, unless authors re-implemented them into JEM7.)
(There are a lot of results related to encoder constraints and speed-ups, please feel free to read the paper if interested.)

This is the 20th story in this month!

Reference

[2019 DCC] [Galpin DCC’19]
CNN-based driving of block partitioning for intra slices encoding

Codec Fast Prediction

H.264 to HEVC [Wei VCIP’17] [H-LSTM]
HEVC [Yu ICIP’15 / Liu ISCAS’16 / Liu TIP’16] [Laude PCS’16] [Li ICME’17] [Katayama ICICT’18] [Chang DCC’18] [ETH-CNN & ETH-LSTM] [Zhang RCAR’19] [Kim TCVST’19] [LFHI & LFSD & LFMD Using AK-CNN]
3D-HEVC [AQ-CNN]
VVC [Jin VCIP’17] [Jin PCM’17] [Wang ICIP’18] [Galpin DCC’19] [Pooling-Variable CNN]