Reading: CSFN & CSFN-M — Channel Splitting and Fusion Network (Super Resolution)

CSFN Outperforms CARN, IDN, MemNet & VDSR; CSFN-M Outperforms CARN-M, SRFBN-S & DRRN

Sik-Ho Tsang
5 min readJul 26, 2020

In this story, Low Complexity Single Image Super-Resolution with Channel Splitting and Fusion Network (CSFN), by Nanjing University, is presented. In this paper:

  • A low complexity solution based on channel splitting and fusion network (CSFN) is proposed.
  • Channel splitting and channel fusion to enhance feature maps and make full use of valuable information.
  • Multiple residual channel splitting and fusion blocks (CSFB) are cascaded to continuously extract more important information for reconstruction.
  • To further minimize redundant parameters and improve efficiency, group and recursive convolutional layer strategies are adopted in CSFB to form a lightweight block called CSFB-M. (M stands for Mobile)

This is a paper in 2020 ICASSP. (Sik-Ho Tsang @ Medium)

Outline

  1. CSFB: Network Architecture
  2. Residual Channel Splitting and Fusion Block (CSFB)
  3. Lightweight Version: CSFB-M
  4. Experimental Results

1. CSFB: Network Architecture

CSFB: Network Architecture
  • CSFN is divided into three parts: shallow feature extraction block (FEBlock), residual channel splitting and fusion blocks (CSFBlocks) and upscale block (UPBlock).
  • FEBlock: one convolutional layer is used to extract features from the LR image. F0 is the shallow features extracted by FEBlock.
  • CSFBlocks: consist of multiple CSFBs and a bottom convolution layer, and the output of CSFBlocks can be expressed as:
  • where ft is the bottom convolution function and HCSF,i represents the operation of i-th CSFB.
  • UPBlock: does the LR-HR transforming and reconstruct the HR image.
  • ESPCN is used in the upscale module. For scale 2 and scale 3, we use one Conv-PixelShuffle structure. Two Conv-PixelShuffle modules are used for scale 4.

2. Residual Channel Splitting and Fusion Block (CSFB)

Residual Channel Splitting and Fusion Block (CSFB)
  • The proposed CSFB can be roughly divided into two pipelines. The left pipeline is a feature fusion module based on channel split (CSFF) and the right one is a global feature extraction (GFE) module.
  • The other part of CSFB for residual learning.

2.1. Feature Fusion module based on Channel Split (CSFF)

  • CSFF module mainly focuses on the imbalance of channel information.
  • An asymmetric channel split is adopted to divide the feature into two parts.
  • Fi-1 is split into two parts which contain s and c0-s channels respectively, where s is less than c0/2 in the network.
  • After convolution, the dimensions of left and right branch in CSFF as ca and cb, subjected to ca+cb=c0. ca>s is the restriction on ca to guide the network to extract more information.
  • Then the two feature maps are fused by concatenation and the 1×1 convolution.

2.2. Global Feature Extraction (GFE)

  • The main purpose of GFE module (the right pipeline) is to compensate for CSFF module since CSFF module only uses partial channel information independently which leads to incomplete global information.
  • A channel compression and expansion unit is used to extract features and promote channel information fusion as well as reducing the number of parameters.
  • The feature maps dimension of i-th Conv-ReLU output are denoted as cn, cn, and c0 respectively (cn < c0).

3. Lightweight Version: CSFB-M

CSFB-M
  • To further reduce parameters and enhance feature maps, a more lightweight network (CSFN-M) is proposed which replaces CSFB with CSFB-M.
  • Group convolutional layer, originated in AlexNet and ResNeXt, is adopted in CSFF and GFE.
  • Recursive CSFB block is used such that feature maps would be enhanced for t times.

4. Experimental Results

4.1. Settings

  • 3×3 Convolutional layers are used except the specified 1×1 convolutional layers. The number of channels in Fi is set to 64.
  • 10 CSFBs are used in the proposed CSFN and 5 for CSFN-M.
  • In each CSFB, c0, s, ca, cb and cn are set to 64, 16, 32, 32 and 16 respectively.
  • For the proposed CSFB-M, the same setting is used with CSFB but the number of groups is set to 2, t is set to 3 for each CSFB-M.
  • L1 loss is used.
  • DIV2K 800 training images are used for training.
  • (There are experiments for network analysis, but I think the results are quite close…)

4.2. SOTA Comparison

Average PSNR/SSIMs for scale 2, 3 and 4. Red/blue text: best/second-best, underline text: best result below 500K parameters.
  • For a fair comparison, the heavy network such as EDSR, DBPN, RDN, RCAN is excluded.
  • CSFN has better performance with fewer parameters and FLOPs than CARN, note that CARN uses larger patch sizes in training time and multi-scale training strategy to improve the final result.
  • Of course, CSFN also outperforms IDN, MemNet and VDSR.
  • For a more lightweight model, CSFN-M can achieve higher performance compared with DRRN, CARN-M, and similar or better performance than SRFBN-S, but SRFBN-S has larger FLOPs.
Qualitative comparisons
  • img076: The proposed CSFN and CSFN-M rebuild the glass edge more accurately while other models only smoothes the area.
  • barbara: Other models generate the wrong texture when CSFN can predict the texture of this spotted cloth correctly, and CSFN-M can correctly predict partial structures with fewer parameters.
  • img092: All other methods infer the wrong black line, but CSFN can make full use of the information in low-resolution images to accurately estimate the direction of the line.

This is the 24th story in this month.

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet