Review — Fast Intra Coding Algorithm for Depth Map with End-to-End Edge Detection Network (3D-HEVC Intra)

The Encoding Time is Reduced by 39.56% on Average

Sik-Ho Tsang
5 min readMay 14, 2021
Depth Map Example

In this story, Fast Intra Coding Algorithm for Depth Map with End-to-End Edge Detection Network, (Liu VCIP’20), is briefly reviewed since I need to review a manuscript in one IEEE transactions. In this paper:

  • First, Holistically Nested Edge Detection (HED) network is used for edge detection.
  • Then, Ostu method is used to divide the output of the HED into foreground region and background region.
  • Finally, the CU size and the candidate list of intra mode are determined according to the region of coding tree unit (CTU).

This is a paper in 2020 VCIP. (Sik-Ho Tsang @ Medium)

Outline

  1. Brief Introduction in HEVC Depth Coding
  2. Proposed Fast Approach
  3. Experimental Results

1. Brief Introduction in HEVC Depth Coding

Intra Prediction Mode
  • In HEVC Depth Coding, except the 35 intra prediction modes in HEVC (0–34), there are also 2 DMM modes (35–36).
  • The coding time of depth map is about 4 times than texture map, accounting for more than 80% of the total coding time.
  • In addition, the probability of DMM modes being selected as the best mode is only 0.85%.

2. Proposed Fast Approach

Proposed Fast Approach

2.1. HED Network for Edge Detection

HED Network
  • First, Holistically Nested Edge Detection (HED) network (2015 ICCV), is used for edge detection.
  • The input is the depth map.
  • In this network, convolutional neural network (CNN) at different scales is used from scales of 1 to 5.
  • Then, fusion module is used for fuse the multi-scale features.
  • Finally, the probabilistic edge map is generated as output.

Since the coding unit (CU) with edges tends to use a more complicated modes and a smaller CU size, whereas the CU without edges tends to use a more smooth mode, such as DC or planar, and a larger CU size.

  • (In this paper, HED network is used to detect the edge. HED network is a 2015 ICCV paper with over 200 citations. There are also other edge detection approaches afterwards.)

2.2. Ostu Method for Region Division

Left: Depth Map, Right: Binarized Region Division Map
  • Ostu method is used to divide the output of the HED into foreground region and background region.
  • Otsu is a common method to determine adaptive threshold.
  • It is used to binarize edge detection map into more easily processed edge map.
  • where W0 represents the proportion of non-edge pixels in the whole image, and the average gray scale is U0 . W1 represents the proportion of edge pixels in the whole image, and the average gray scale is U1.
  • The average gray scale of the whole image is U.
  • σn² is the inter-class variance.
  • Let the pixel K value in the entire pixel range of the image in turn, to find the corresponding inter-class variance.
  • The k value corresponding to the maximum inter-class variance is the optimal threshold T.
  • When the pixel value of a pixel is greater than or equal to the threshold value T, it will be judged as foreground region, otherwise it will be judged as background region.

2.3. CU Size and Mode Decision

CU Size and Mode Decision
  • Step 1: When PU is 64×64 and located in the edge region, only the Angular modes is traversed to find the optimal mode. Otherwise, perform Step 2.
  • Step 2: Perform rough mode and most probable mode decision and only the non-angular modes are traversed to find the optimal mode.
  • Step 3: When PU is not 64×64 and located in the edge region, 35 HEVC modes are skipped and only DMM modes are traversed to find the best partition mode as the optimal mode. Otherwise, perform Step 4.
  • Step 4: Perform rough mode and most probable mode decision and only the non-angular modes and non-DMM modes are traversed to find the optimal mode.
  • Step 5: The total RD cost of all candidate modes in the candidate mode list is calculated to find the optimal mode.

3. Experimental Results

Comparison with [11]
  • HTM-16.0 with intra main configuration is used.
  • GPU is disabled during testing.
  • V/T: BD-Rate of coded texture views over total bitrate.
  • S/T: BD-Rate of synthesized views over total bitrate.
  • ΔT: The time saving.
  • Compared with the original HTM, average 0.31% and 1.22% BD-Rate loss for coded views and synthesized views, respectively.
  • The proposed method outperforms [11], which is a 2015 ICASSP paper.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.