Review — Toward Achieving Robust Low-Level and High-Level Scene Parsing

EFCN, Aggregate Contexts Using Convolutional Context Network (CCN)

Semantic segmentation demands robust high-level as well as low-level parsing. EFCN outperforms FCN for both the high-level smoothing/recognition.
  • It is found that the parsing performance of “skip” network can be noticeably improved by modifying the parameterization of skip layers.
  • Thus, “dense skip” architecture is introduced to retain a rich set of low-level information. A Convolutional Context Network (CCN) is proposed based on “dense skip” layers, to aggregate contexts for high-level feature maps.
  • Finally, Enhanced Fully Convolutional Network (EFCN) is formed.

Outline

  1. Parameterization of Skip Layers, Dense Skip Layers, & CCN
  2. Enhanced Fully Convolutional Network (EFCN)
  3. Results

1. Parameterization of Skip Layers, Dense Skip Layers, & CCN

1.1. Skip Layers

Left Figure: Skip layers of segmentation networks. Right Figure: Parameterizations of skip layers. Table: Comparisons of “Dilation” and “Skip” Using VGG-16 on ADE20K.
  • Both “Dilation” and “Skip” approaches can retain some useful information to improve parsing performance. However, it’s important to mention that “Dilation” network is significantly slower than “Skip” network.
  • “Our Skip”: is proposed where 2-layer convolutional layers with batch normalization (BN) are used to parameterize skip layers (i.e.: Conv+BN+ReLU+Conv).

1.2. Dense Skip Layers

Comparison Between “Sparse Skip” and “Dense Skip” Using VGG-16 on ADE20K.
  • “dense skip” network adds skip layers for each intermediate feature map after POOL3, which can help to aggregate multi-scale contexts. (More details in 1.3.)

1.3. Convolutional Context Network (CCN)

Convolutional context network (CCN) is a convolutional network with dense skip layers.
  • As shown above, several conv blocks are chained to progressively expand the contextual view of feature maps.
Applying Different Context Aggregation (CA) Modules into “Our Skip” Architecture.
  • Segmentation networks with CA modules outperforms baseline FCN.

2. Enhanced Fully Convolutional Network (EFCN)

Network architecture of EFCN-xs.
  • EFCN-xs is shown as above.
  • First, “dense skip” architecture is used in EFCN to retain and incorporate low-level information from pre-trained CNN, which enhances low-level visual understanding (e.g., boundary localization).
  • Moreover, CCN is introduced to aggregate context for high-level feature maps, which brings benefits to high-level visual parsing.
  • EFCN-4s and EFCN-2s can be trivially inferred from the above architecture demonstration.
Comparisons of Different Networks on ADE20K.

3. Results

3.1. Ablation Studies

Different EFCNs on ADE20K.
EFCN-8s Ablation

3.2. ADE20K

SOTA Comparisons on ADE20K.
  • Although EFCN slightly lags behind PSPNet, the proposed EFCN is faster than PSPNet and requires much less memory than PSPNet.
Qualitative ablation analysis of the proposed segmentation network — EFCN. Images are from ADE20K dataset.
  • The proposed “dense skip” architecture helps retain detailed spatial information.

3.3. Other Segmentation Benchmarks

SOTA Comparisons on Pascal Context
SOTA Comparisons on SUN RGB-D
SOTA Comparisons on Pascal VOC 2012

--

--

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store