Sik-Ho Tsang

May 15, 2020

4 min read

Reading: IntraNN — CNNs for Video Intra Prediction Using Cross-component Adaptation (HEVC Intra)

2% BD-Rate Reduction for Luma and 1.5% BD-Rate Reduction for Chroma


  1. Network Architecture
  2. Some Training Details
  3. HEVC Implementation
  4. Experimental Results

1. Network Architecture

C2 Network Architecture

1.1. Architecture Variants

  • Authors tried 3 versions of architecture, C0, C1 and C2.
  • C0: Only fully connected (FC) layers are used. That means no convolutions.
  • C1: Only one 4×4 convolutional layer before FC layers.
  • C2: As shown above, two convolutional layers, 3×3 then 2×2, before FC layers. As no padding is used, the number of reference lines limits the possible kernel sizes and combinations.
  • Leaky ReLU is used except the last output layer.
  • For the luma, it is shown as the upper branch.
  • For the chroma, it is shown as the lower branch where two chroma are concatenated before inputting into the network. And it is called cross-component (CRCO) prediction.
  • It is quite special that it is a L-shape input.
BD-Rate (%) for Architecture Variants
  • It is found that C2 obtains the largest coding gain of 2.01% BD-rate reduction.

2. Some Training Details

  • All networks are trained on samples from 104 sequences with varying resolutions. 11 additional videos are used as a validation set.
  • Authors have provided supplementary material about the sequences they use:
  • Horizontal and vertical flipping are applied for data augmentation.
  • Channel-wise mean of the reference area is subtracted from both the reference and the prediction area.
  • Some samples with low variance are excluded from training.
  • Two loss functions are tried: L1 and SATD. SATD is a Hadamard transformed sum of absolute difference. Simply speaking, this is a simplified transform which is commonly used in video coding for preliminarily checking of the the transformed residual SAD.
Loss Functions Variants Using C0 Architecture
  • It is found that SATD loss outperforms L0 one.

3. HEVC Implementation

  • HM-16.9 is used.
  • The above CNN-based prediction is treated as a new mode, the 36th intra prediction mode, called IntraNN.
  • (To know more about intra prediction, please feel free to read Sections 1 & 2 in IPCNN.)
  • The most probable mode list was extended to hold a fourth option by sending an additional bit when the third list position would have been chosen otherwise.
  • There are two methods to place the IntraNN in the intra prediction mode candidate list. One is END and one is UP.
  • END: It is always put in the last position. Thus, this END signaling version should cause less overhead when the IntraNN mode is not chosen.
  • UP: It is placed directly behind the modes used in the neighborhood. This UP version provides lower signaling costs for the IntraNN mode when in use.

4. Experimental Results

BD-rate (%) compared to HEVC for the different signaling modes and architectures
  • As shown above, in most cases, the UP variant gives slightly better BD-rate gains than the END version especially for the luma channel.
  • Cross-component (CRCO) version outperforms the version that not using network-based chroma prediction on every channel and sequence, on average by -0.6% on the luma and by -0.97% and -0.95% on the chroma channels.


Codec Intra Prediction

My Other Previous Readings