Reading: C3 — Concentrated-Comprehensive Convolution (Semantic Segmentation)
Compared to ESPNet, ERFNet, DRN & ENet, Similar or Improved mIOU Achieved While Obtaining Smaller model sizes and fewer number of FLOPs
In this story, Concentrated-Comprehensive Convolution (C3), by Seoul National University, and CLOVA AI Research, Naver Corp., is shortly presented. In this paper:
- A new block called Concentrated-Comprehensive Convolution (C3) which applies the asymmetric convolutions before the depth-wise separable dilated convolution to compensate for the information loss due to dilated convolution.
- C3 is applied to ESPNet and achieve about 2% better performance while reducing the number of parameters by half and the number of FLOPs by 35% compared with the original ESPNet.
This is a paper in 2019 arXiv. (Sik-Ho Tsang @ Medium)
Outline
- Concentrated-Comprehensive Convolution (C3)
- C3 Module
- Experimental Results
1. Concentrated-Comprehensive Convolution (C3)
- The complexity is further reduced by using two depth-wise asymmetric convolutions instead of a regular depth-wise convolution.
- Also, non-linearity (PReLU and Batch normalization) is inserted between the asymmetric filters.
- After that, the cross-channel operation is executed with a 1×1 point-wise convolution.
In summary, the C3 block combines both advantages of the depth-wise separable convolution and the dilated convolution.
2. C3 Module
- In ESPNet module, the feature maps are added one by one in a hierarchical way, i.e. Hierarchical feature fusion (HFF), before concatenation.
- In C3 module, the feature maps are just concatenated directly.
- Also, dilated rate=1 is excluded in C3 module.
3. Experimental Results
3.1. Ablation Study
- (2)-(5): A naive usage of the depthwise separable architecture brought significant degradation of the performance (about 3 to 5%), and even HFF module could not fully resolve the performance degradation in (2).
- (3)-(5): It can be concluded that the concentration stage is critical for resolving the accuracy drop from depthwise separable dilated conv.
- (4): With number of layers increased, mIOU is increased.
- (5): With also wider, more channels, mIOU is further improved.
- (6): Using C3 but with RC3, mIOU is improved much.
- (7): Using C3, mIOU obtained is the highest.
3.2. SOTA Comparison
- C3 module is easily applied on DRN, ENet, ERFNet and ESPNet.
- With C3 module, smaller model sizes and fewer number of FLOPs are obtained with similar or improved mIOU achieved.
- Both of C3Net1 and C3Net2 use ESPNet as a baseline but with varying dilation rate d, which is d = {2, 4, 8, 16} and {2, 3, 7, 13}, respectively in C3 module.
- C3Net2 outperforms C3Net1 about 1% with fewer parameters, shows that the dilation rates should be coprime.
3.3. Visualization
- DS-ESPNet has gridding effect while C3Net1 removes it.
Reference
[2019 arXiv] [C3]
C3: Concentrated-Comprehensive Convolution and its application to semantic segmentation
Semantic Segmentation
[FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [DPN] [ENet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [ERFNet] [GCN] [PSPNet] [DeepLabv3] [ESPNet] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN] [DeepLabv3+] [C3] [DRRN Zhang JNCA’20]