Reading: FRCNN — Fractional-Pixel Reference Generation CNN (HEVC Inter Prediction)

VRCNN-Like Network, Outperforms CNNIF, 3.9%, 2.7%, and 1.3% Bits Saving Compared with HEVC, Under LDP, LDB, and RA configurations, Respectively

  • FRCNN is designed to generate the fractional pixels based on the integer-pel pixels.
  • The most important thing for these interpolation problems is: how to find the ground-truth samples for training while the fractional samples are not really available in the original videos.


  1. FRCNN: Network Architecture
  2. Sample Collection
  3. Training & HEVC Implementation
  4. Experimental Results

1. FRCNN: Network Architecture

1.1. Input & Output

FRCNN: Input & Output
  • Integer-Pel Pixels (Left): are Ai,j. Interpolation is performed to obtained ai,j, to ni,j, which are sub-pel pixels including half-pel and quarter-pel pixels
  • Traditional Methods (Top-Right): is to performed to predict all half-pel and quarter-pel pixels altogether based on the reference pixels (Red), i.e. integer-pel pixels.
  • FRCNN (Bottom-Right): Input is the reference pixels (Red), i.e. integer-pel pixels. By going through CNN, the predicted different sub-pel position blocks are output.
  • In HEVC, as the interpolated blocks are 4 times bigger at the x and y directions, it is 16× larger than the reference blocks. Therefore, 15 FRCNN models are used to predict those sub-pel pixels.

1.2. Network Architecture

FRCNN: Network Architecture
  • FRCNN actually uses VRCNN network architecture. (Please feel free to read VRNN if interested.)
  • (Also, I cannot fin

2. Sample Collection

Sample Collection

2.1. FRCNN-U for Uni-Directional Prediction

  • The current block extracted from the original video sequence is marked as the target/label Yi, as shown above.
  • According to the coded fractional-pixel MV, the “referenced fractional block” in the reference picture is found out, depicted by yellow dash line.
  • Then the corresponding integer block is found out by moving the referenced fractional block towards the top-left direction until the nearest integer pixels, depicted by purple line.
  • Next, the corresponding integer block is padded in four directions (up, down, left, right) by a specific width.
  • The padding width is determined by the effective kernel size of the FRCNN model. The padded block, depicted by red line, is extracted from the reconstructed video sequence and marked as the input Xi.
  • Since HEVC enables quarter-pixel MV precision, all the training samples are divided into 15 sets, each set is used to train an individual model.
  • When generating training data for FRCNN-U we adopt the Low-Delay P configuration.

2.2. FRCNN-B for Bi-Directional Prediction

  • Similar to FRCNN-U, but only the blocks coded with bi-directional prediction, where the second MV is fractional, are selected.
  • When generating training data for FRCNN-B we adopt the Low-Delay B configuration.

3. Training & HEVC Implementation

  • FRCNN-U models are used if the PU is coded with uni-directional mode.
  • FRCNN-U and FRCNN-B are used simultaneously (FRCNN-U for list-0 and FRCNN-B for list-1) if the PU is coded with bi-directional mode.
  • Block-based fractional reference filter type (FRFT) selection: An additional CU-level flag is added so that DCTIF (DCT Interpolation Filter) and FRCNN are competed, and being selected based on the minimum rate distortion (RD) cost.
  • FRFT Merge: When a CU is coded with merge 2N×2N mode, its FRFT is also merged rather than decided by R-D cost.
  • Different FRCNN Models are used for different QPs
  • Thus, in total there are 120 models (FRCNN-U and FRCNN-B, 4 QPs: 22, 27, 32, 37, 15 proper fractional MVs).
  • For the training data, we use only one video sequence, namely BlowingBubbles, which is a common test sequence in HEVC.

4. Experimental Results

4.1. BD-Rate

BD-Rate (%) on HEVC Test Sequences
  • 3.9%, 2.7%, and 1.3% BD-rate reduction is obtained compared with HEVC, under LDP, LDB, and RA configurations respectively.
BD-Rate (%) on HEVC Test Sequences, Only FRCNN-U Under LDB Configurations
  • When only FRCNN-U is used in LDB configuration, only 2.0% BD-rate reduction is obtained which shows the importance of FRCNN-B.

2.2. Hitting Ratios

Hitting Ratio Under LDP Configuration
  • Certain number of CUs chooses the FRCNN rather than DCTIF.
  • Hitting ratio decreases when QP increases.

2.3. RD Curves

RD Curves
  • RD curves show that FRCNN is more useful at high bitrate condition than at low bitrate condition, which is consistent to the hitting ratio results.

2.4. Visualization

  • Pink CUs use FRCNN while blue CUs use DCTIF.
  • FRCNN tends to select rich texture region, such as water and clothes.

2.5. Comparison with Prior Art CNNIF [12]

BD-Rate by CNNIF [12] Under LDP Configuration
  • Compared with CNNIF which only obtains 0.5% BD-rate reduction, FRCNN obtains 3.9% BD-rate reduction, which is much powerful than CNNIF.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store