Reading: Galpin DCC’19 — CNN-Based Driving of Block Partitioning (VVC Intra Prediction)

Using ResNet Blocks, Outperforms Liu ISCAS’16 and ETH-CNN. Speed-Up of ×2 Without BD-rate Loss, or Speed-Up Above ×4 With a Loss Below 1% in BD-rate

In this story, “CNN-based driving of block partitioning for intra slices encoding” (Galpin DCC’19), by Technicolor, is presented. I read this because I work on video coding research. In this paper:

  • A CNN is used for fast CU partitioning in JEM, which is a future video coding (FVC) beyond HEVC. And later on, FVC is modified to become VVC.

This is a paper in 2019 DCC. (Sik-Ho Tsang @ Medium)


  1. Overall Framework
  2. CNN Network Architecture
  3. Probable Split Selection
  4. Experimental Results

1. Overall Framework

Overall Framework

1.1. CNN-based analysis

  • In the first step, each input 65×65 patch is analyzed by a CNN-based texture analyzer. The output consists of a vector of probabilities associated to each elementary boundary that separate elementary sub-blocks:
Mapping of the boundary locations onto a vector n × 1.
  • The above figure illustrates the mapping between elementary boundary locations and the vector of probabilities. The size of elementary blocks being 4 × 4, the vector contains n = 480 probability values.

1.2. Probable split selection

  • The second step takes as input the probability of each elementary boundary and outputs a first set of splitting modes among all possible options, which are: no split, QT, BT (vertical, horizontal), ABT (top, bottom, left, right).
  • i.e. based on the probability vector, to decide what kinds of splitting will be performed: no split, QT, BT (vertical, horizontal), ABT (top, bottom, left, right).

1.3. Encoder constraints and speed-ups

  • The encoder itself has some predefined rules and settings for the splitting.
  • The third step selects the final set of splitting modes to be checked by classical RDO, depending on the first set provided by step 2, the contextual split constraints and the encoder speed-ups.
  • (I will not focus on this part.)

2. CNN Network Architecture

CNN layout for boundary prediction
  • The CNN is loosely based on a small ResNet, composed of a set of convolutional layers with several skip connections. The main differences with a small ResNet (i.e. 18-layer version with 2 building blocks) are:
  • The adaptation of the number of filters (e.g. up to 48 filters instead of 512) and layers (13 layers instead of 18),
  • The absence of batch norm, average pooling and double fully connected layers,
  • The pooling at the end of each scaling block,
  • The absence of stride during convolution,
  • QStep is normalized between 0 and 1, and input into network as well.
  • Finally, the model contains ∼ 225k parameters (compared to ∼ 10M for ResNet-18).
Dataset for Training
  • DIV2K is used for training, 10M patches, encoded with QPs ranging from 19 to 41.
  • Batch size of 256 is used.
  • L2 norm is used as loss function:
  • where ck is the weight of convolutions to regularize the network.

3. Probable Split Selection

  • (a): After obtaining the 480-dimensional vector.
  • (b): It is formed back into the CU.
  • (c): At each location of possible splits, the average value of elementary boundaries corresponding to each half-split boundary is computed.
  • These values are then compared to pre-determined thresholds for each split choice. A decision to explore or not each choice is finally made.
  • (d): provides an example where, at this step, i.e. before performing the third step, QT, BT horizontal, and ABT bottom will be considered by the classical RDO in the example shown above.

4. Experimental Results

BD-Rate and Time Saving
  • JEM7 is used.
  • The proposed CNN obtains 54% time reduction with 0.005% BD-rate loss only, which outperforms two CNN variants, Liu ISCAS’16 [2] and ETH-CNN [3].
  • (But it is difficult to have a fair comparison since they are using different versions of encoder, unless authors re-implemented them into JEM7.)
  • (There are a lot of results related to encoder constraints and speed-ups, please feel free to read the paper if interested.)

This is the 20th story in this month!

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

When Do We Need Machine Learning?

Review — MRRN: Quality Enhancement Network via Multi-Reconstruction Recursive Residual Learning…

Dealing with Class Imbalances in Machine Learning

Review — MixMatch: A Holistic Approach to Semi-Supervised Learning

Location-aware kernels for large images tiled processing with convolutional neural networks

Review: H-DenseUNet — 2D & 3D DenseUNet for Intra & Inter Slice Features (Biomedical Image…

Review: GAN — Generative Adversarial Nets (GAN)

Review: RefineNet — Multi-path Refinement Network (Semantic Segmentation)

Get the Medium app

Sik-Ho Tsang

Sik-Ho Tsang

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List:

More from Medium

Paper Review: Parameter Prediction for Unseen Deep Architectures

Review — fastText: Enriching Word Vectors with Subword Information

Differentiality meets Conditional Computation

Should We Unify all ML Frameworks?