Reading: CSFN & CSFN-M — Channel Splitting and Fusion Network (Super Resolution)

CSFN Outperforms CARN, IDN, MemNet & VDSR; CSFN-M Outperforms CARN-M, SRFBN-S & DRRN

5 min readJul 26, 2020

In this story, Low Complexity Single Image Super-Resolution with Channel Splitting and Fusion Network (CSFN), by Nanjing University, is presented. In this paper:

A low complexity solution based on channel splitting and fusion network (CSFN) is proposed.
Channel splitting and channel fusion to enhance feature maps and make full use of valuable information.
Multiple residual channel splitting and fusion blocks (CSFB) are cascaded to continuously extract more important information for reconstruction.
To further minimize redundant parameters and improve efficiency, group and recursive convolutional layer strategies are adopted in CSFB to form a lightweight block called CSFB-M. (M stands for Mobile)

This is a paper in 2020 ICASSP. (Sik-Ho Tsang @ Medium)

Outline

CSFB: Network Architecture
Residual Channel Splitting and Fusion Block (CSFB)
Lightweight Version: CSFB-M
Experimental Results

1. CSFB: Network Architecture

CSFN is divided into three parts: shallow feature extraction block (FEBlock), residual channel splitting and fusion blocks (CSFBlocks) and upscale block (UPBlock).
FEBlock: one convolutional layer is used to extract features from the LR image. F0 is the shallow features extracted by FEBlock.
CSFBlocks: consist of multiple CSFBs and a bottom convolution layer, and the output of CSFBlocks can be expressed as:

where ft is the bottom convolution function and HCSF,i represents the operation of i-th CSFB.
UPBlock: does the LR-HR transforming and reconstruct the HR image.
ESPCN is used in the upscale module. For scale 2 and scale 3, we use one Conv-PixelShuffle structure. Two Conv-PixelShuffle modules are used for scale 4.

2. Residual Channel Splitting and Fusion Block (CSFB)

The proposed CSFB can be roughly divided into two pipelines. The left pipeline is a feature fusion module based on channel split (CSFF) and the right one is a global feature extraction (GFE) module.
The other part of CSFB for residual learning.

2.1. Feature Fusion module based on Channel Split (CSFF)

CSFF module mainly focuses on the imbalance of channel information.
An asymmetric channel split is adopted to divide the feature into two parts.
Fi-1 is split into two parts which contain s and c0-s channels respectively, where s is less than c0/2 in the network.
After convolution, the dimensions of left and right branch in CSFF as ca and cb, subjected to ca+cb=c0. ca>s is the restriction on ca to guide the network to extract more information.
Then the two feature maps are fused by concatenation and the 1×1 convolution.

2.2. Global Feature Extraction (GFE)

The main purpose of GFE module (the right pipeline) is to compensate for CSFF module since CSFF module only uses partial channel information independently which leads to incomplete global information.
A channel compression and expansion unit is used to extract features and promote channel information fusion as well as reducing the number of parameters.
The feature maps dimension of i-th Conv-ReLU output are denoted as cn, cn, and c0 respectively (cn < c0).

3. Lightweight Version: CSFB-M

To further reduce parameters and enhance feature maps, a more lightweight network (CSFN-M) is proposed which replaces CSFB with CSFB-M.
Group convolutional layer, originated in AlexNet and ResNeXt, is adopted in CSFF and GFE.
Recursive CSFB block is used such that feature maps would be enhanced for t times.

4. Experimental Results

4.1. Settings

3×3 Convolutional layers are used except the specified 1×1 convolutional layers. The number of channels in Fi is set to 64.
10 CSFBs are used in the proposed CSFN and 5 for CSFN-M.
In each CSFB, c0, s, ca, cb and cn are set to 64, 16, 32, 32 and 16 respectively.
For the proposed CSFB-M, the same setting is used with CSFB but the number of groups is set to 2, t is set to 3 for each CSFB-M.
L1 loss is used.
DIV2K 800 training images are used for training.
(There are experiments for network analysis, but I think the results are quite close…)

4.2. SOTA Comparison

**Average PSNR/SSIMs for scale 2, 3 and 4. Red/blue text: best/second-best, underline text: best result below 500K parameters.**

For a fair comparison, the heavy network such as EDSR, DBPN, RDN, RCAN is excluded.
CSFN has better performance with fewer parameters and FLOPs than CARN, note that CARN uses larger patch sizes in training time and multi-scale training strategy to improve the final result.
Of course, CSFN also outperforms IDN, MemNet and VDSR.
For a more lightweight model, CSFN-M can achieve higher performance compared with DRRN, CARN-M, and similar or better performance than SRFBN-S, but SRFBN-S has larger FLOPs.

img076: The proposed CSFN and CSFN-M rebuild the glass edge more accurately while other models only smoothes the area.
barbara: Other models generate the wrong texture when CSFN can predict the texture of this spotted cloth correctly, and CSFN-M can correctly predict partial structures with fewer parameters.
img092: All other methods infer the wrong black line, but CSFN can make full use of the information in low-resolution images to accurately estimate the direction of the line.

This is the 24th story in this month.

Reference

[2020 ICASSP] [CSFN & CSFN-M]
Low Complexity Single Image Super-Resolution with Channel Splitting and Fusion Network

Super Resolution

[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DnCNN] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [MemNet] [IRCNN] [WDRN / WavResNet] [MWCNN] [SRDenseNet] [SRGAN & SRResNet] [SelNet] [CNF] [BT-SRN][EDSR & MDSR] [EnhanceNet] [MDesNet] [RDN] [SRMD & SRMDNF] [DBPN & D-DBPN] [RCAN] [ESRGAN] [CARN] [IDN] [ZSSR] [MSRN] [SR+STN] [IDBP-CNN-IA] [SRFBN] [OISR] [PRLSR] [CSFN & CSFN-M]