Reading: C3 — Concentrated-Comprehensive Convolution (Semantic Segmentation)

Compared to ESPNet, ERFNet, DRN & ENet, Similar or Improved mIOU Achieved While Obtaining Smaller model sizes and fewer number of FLOPs

3 min readOct 11, 2020

In this story, Concentrated-Comprehensive Convolution (C3), by Seoul National University, and CLOVA AI Research, Naver Corp., is shortly presented. In this paper:

A new block called Concentrated-Comprehensive Convolution (C3) which applies the asymmetric convolutions before the depth-wise separable dilated convolution to compensate for the information loss due to dilated convolution.
C3 is applied to ESPNet and achieve about 2% better performance while reducing the number of parameters by half and the number of FLOPs by 35% compared with the original ESPNet.

This is a paper in 2019 arXiv. (Sik-Ho Tsang @ Medium)

Outline

Concentrated-Comprehensive Convolution (C3)
C3 Module
Experimental Results

1. Concentrated-Comprehensive Convolution (C3)

Upper: Conventional, **Bottom: Concentrated-Comprehensive Convolution (C3)**

The complexity is further reduced by using two depth-wise asymmetric convolutions instead of a regular depth-wise convolution.
Also, non-linearity (PReLU and Batch normalization) is inserted between the asymmetric filters.
After that, the cross-channel operation is executed with a 1×1 point-wise convolution.

In summary, the C3 block combines both advantages of the depth-wise separable convolution and the dilated convolution.

2. C3 Module

**Network structure of C3 and ESP module**

In ESPNet module, the feature maps are added one by one in a hierarchical way, i.e. Hierarchical feature fusion (HFF), before concatenation.
In C3 module, the feature maps are just concatenated directly.
Also, dilated rate=1 is excluded in C3 module.

3. Experimental Results

3.1. Ablation Study

**Ablation Study on Cityscape Test Set**

(2)-(5): A naive usage of the depthwise separable architecture brought significant degradation of the performance (about 3 to 5%), and even HFF module could not fully resolve the performance degradation in (2).
(3)-(5): It can be concluded that the concentration stage is critical for resolving the accuracy drop from depthwise separable dilated conv.
(4): With number of layers increased, mIOU is increased.
(5): With also wider, more channels, mIOU is further improved.
(6): Using C3 but with RC3, mIOU is improved much.
(7): Using C3, mIOU obtained is the highest.

3.2. SOTA Comparison

C3 module is easily applied on DRN, ENet, ERFNet and ESPNet.
With C3 module, smaller model sizes and fewer number of FLOPs are obtained with similar or improved mIOU achieved.
Both of C3Net1 and C3Net2 use ESPNet as a baseline but with varying dilation rate d, which is d = {2, 4, 8, 16} and {2, 3, 7, 13}, respectively in C3 module.
C3Net2 outperforms C3Net1 about 1% with fewer parameters, shows that the dilation rates should be coprime.

3.3. Visualization

DS-ESPNet has gridding effect while C3Net1 removes it.

Reference

[2019 arXiv] [C3]
C3: Concentrated-Comprehensive Convolution and its application to semantic segmentation

Semantic Segmentation

[FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [DPN] [ENet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [ERFNet] [GCN] [PSPNet] [DeepLabv3] [ESPNet] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN] [DeepLabv3+] [C3] [DRRN Zhang JNCA’20]

Reading: C3 — Concentrated-Comprehensive Convolution (Semantic Segmentation)

Compared to ESPNet, ERFNet, DRN & ENet, Similar or Improved mIOU Achieved While Obtaining Smaller model sizes and fewer number of FLOPs

Outline

1. Concentrated-Comprehensive Convolution (C3)

2. C3 Module

3. Experimental Results

3.1. Ablation Study

3.2. SOTA Comparison

3.3. Visualization

Reference

Semantic Segmentation

My Other Previous Readings

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sik-Ho Tsang

No responses yet