Reading: CARN — Cascading Residual Network (Super Resolution)

Outperforms SRDenseNet, MemNet, SelNet, CNF, DRRN, DRCN, LapSRN, VDSR, FSRCNN and SRCNN

Sik-Ho Tsang
6 min readJul 18, 2020

In this story, “Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network” (CARN), by Ajou University, is presented. In this paper:

  • A cascading mechanism is designed at both the local and the global level to incorporate the features from multiple layers. upon a residual network, called CARN.
  • A variant CARN-Mobile (CARN-M) is presented using recursive network architecture.

This is a paper in 2018 ECCV with more than 100 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. CARN: Network Architecture
  2. Effcient Cascading Residual Network
  3. Comparison to SRDenseNet and MemNet
  4. Model Analysis
  5. SOTA Comparison
CARN: Network Architecture
  • The main difference between CARN and ResNet is the presence of local and global cascading modules.
  • (b) graphically depicts how the global cascading occurs. The outputs of intermediary layers are cascaded into the higher layers, and finally converge on a single 1×1 convolution layer.
  • The intermediary layers are implemented as cascading blocks, which host local cascading connections themselves.
Block Structures
  • The main difference between CARN and ResNet lies in the cascading mechanism.
  • CARN has global cascading connections represented as the blue arrows, each of which is followed by a 1×1 convolution layer. Cascading on both the local and global levels has two advantages:
  1. The model incorporates features from multiple layers, which allows learning multi-level representations.
  2. Multi-level cascading connection behaves as multi-level shortcut connections that quickly propagate information from lower to higher layers (and vice-versa, in case of back-propagation).

2. Efficient Cascading Residual Network

2.1. Group Convolution

  • Efficient residual (residual-E) block is proposed, a similar approach to the MobileNet, but use group convolution instead of depthwise convolution.
  • The residual-E block consists of two 3×3 group and one pointwise convolution. as shown above figure (b).
  • Let K be the kernel size and Cin, Cout be the number of input and output channels. Because we retain the feature resolution of the input and output by padding, we can denote F to be both the input and output feature size. Then, the cost of a plain residual block is: 2×(K×K×Cin×Cout×F×F).
  • Let G be the group size. Then, the cost of a residual-E block, which consist of two group convolutions and one pointwise convolution:
  • By changing the plain residual block to our ecient residual block, we can reduce the computation by the ratio of:
  • Because our model uses a kernel of size 3×3 for all group convolutions, and the number of channels is constantly 64, using an efficient residual block instead of a standard residual block can reduce the computation from 1.8 up to 14 times depending on the group size.

2.1. Recursive Network

  • To further reduce the parameters, the recursive network is applied.
  • This approach reduces the parameters by up to three times of their original number.

3. Comparison to SRDenseNet and MemNet

3.1. Comparison to SRDenseNet

  1. CARN uses global cascading, which is more general than the skip connection. In SRDenseNet, all levels of features are combined at the end of the final dense block, but our global cascading scheme connects all blocks, which behaves as multi-level skip connection.
  2. SRDenseNet preserves local information of dense block via concatenation operations, while CARN gathers it progressively by 1×1 convolution layers. The use of additional 1×1 convolution layers results in a higher representation power.

3.2. Comparison to MemNet

  1. In MemNet, the output features of each recursive units are concatenated at the end of the network and then fused with 1×1 convolution. On the other hand, CARN fuses the features at every possible point in the local block, which can boost up the representation power via the additional convolution layers and nonlinearity.
  2. MemNet takes upsampled images as input so the number of multi-adds is larger than CARN. The input to CARN model is a LR image and we upsample it at the end of the network in order to achieve computational efficiency.

4. Model Analysis

4.1. Cascading Modules

Effects of the global and local cascading modules measured on the Set14 ×4 dataset
  • CARN-NL represents CARN without local cascading.
  • CARN-NG without global cascading.
  • Because of the 1×1 convolution layer, the overall number of parameters is increased by up to 10% for CARN variants.
  • The model with only global cascading (CARN-NL) shows better performance than the baseline because the global cascading mechanism effectively carries mid- to high-level frequency signals from shallow to deep layers.
  • Using only local cascading blocks (CARN-NG) harms the performance. As discussed in Pre-Activation ResNet, multiplicative manipulations such as 1×1 convolution on the shortcut connection can hamper information propagation.
  • With both local and global cascading, CARN obtains better performance.

4.2. Efficiency Trade-off

Set14 with ×4 scale
  • The blue line represents the model that does not use the recursive scheme and the orange line is the model that uses recursive cascading block.
  • All efficient models perform worse than the CARN, which shows 28.70 PSNR. But the number of parameters and operations are decreased dramatically.
  • For example, the G64 shows a ve-times reduction in both parameters and operations.
  • Finally, the group size is chosen as four in the efficient residual block and use the recursive network scheme as the CARN-M model.

5. SOTA Comparison

Trade-o between performance vs. number of operations and parameters on Set14 ×4 dataset.
Quantitative results of deep learning-based SR algorithms. Red/blue text: best/second-best.
  • 291 image set by Yang, the Berkeley Segmentation Dataset, and DIV2K dataset are used for training.
  • L1 loss is used.
  • CARN model outperforms all state-of-the-art models that have less than 5M parameters. Especially, CARN has similar number of parameters to that of DRCN, SelNet and SRDenseNet, but CARN outperforms all three models.
  • The MDSRachieves better performance than ours, which is not surprising because MDSR has 8M parameters which are nearly six times more parameters than CARN.
  • Although CARN-M has more parameters than SRCNN or DRRN, it is tolerable in real-world scenarios. The sizes of SRCNN and CARN-M are 200KB and 1.6MB, respectively, all of which are acceptable on recent mobile devices.
Visual qualitative comparison on ×4 scale datasets.
  • CARN works better than others and accurately reconstructs not only stripes and line patterns, but also complex objects such as hand and street lamps.

This is the 13th story in this month.

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet