Reading: AQ-CNN — Adaptive QP Convolutional Neural Network (Fast 3D-HEVC Prediction)

LeNet-Like Architecture, 69.4% Average Time Reduction With Negligible Bitrate Increase on Synthesized Views

3D-HEVC (DIBR: Depth Image Based Rendering)

In this story, Adaptive QP Convolutional Neural Network (AQ-CNN), by Huazhong University of Science and Technology, is briefly presented since this is a 1-page conference paper, there are not much details. I read this because I work on video coding research.

3D-HEVC is one of the extensions of HEVC to support 3D video. With the color/texture videos and depth maps of both left and right views, any intermediate virtual views in between can be synthesized, such that autostereoscopic or free viewpoint 3D videos can be supported. Thus, it is called multiview video plus depth coding (MVD). And recently, the 3D video technology has also been involved (Of course enhancements and changes are needed.) in the MPEG-I project to support 3DoF and 6DoF videos which will be completed in the coming years.

This is a paper in 2019 DCC. (Sik-Ho Tsang @ Medium)

Outline

  1. AQ-CNN: Network Architecture
  2. Experimental Results

1. AQ-CNN: Network Architecture

AQ-CNN: Network Architecture
  • A LeNet-like architecture of a binary classifier is used.
  • The output label is either SPLIT or NOT SPLIT.
  • AQ-CNN consists of two convolutional layers, two pooling layers, two full connection layers and one softmax layer.
  • Three separate models are trained for different CU sizes with the same structure but different parameters
  • Considering that QP has a great influence on CU partition, QP is concatenated to the two full connection layers for better predicting CU splitting labels.

2. Experimental Results

  • HTM-16.2 is used.
  • Compared with HTM-16.2, the proposed algorithm achieves an average time reduction of 69.4% for depth intra coding with negligible BD-rate increase on synthesized views for all intra cases.

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Review — Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (Video Classification)

Review — LSGAN: Least Squares Generative Adversarial Networks (GAN)

Dimensionality Reduction: Zero to Hero (Part — II)

6 Linear Model Selection and Regularization

https://medium.com/@pixelverse/pixelrobots-the-ultimate-nft-collection-powering-pixelverse-f54a5cdc6

Deep learning on point clouds for 3D object detection

Random Forests: At a Glance

Review: CPM — Convolutional Pose Machines (Human Pose Estimation)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sik-Ho Tsang

Sik-Ho Tsang

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

More from Medium

Pytorch part 2

Review — Big Transfer (BiT): General Visual Representation Learning

Model Interpretation using GradCAM

SimSiam in PyTorch, Part 1: The Data