Reading: BT-SRN — Balanced Two-Stage Residual Networks (Super Resolution)

Balanced LR and HR Stages With PConv Block, Outperforms VDSR

5 min readJul 21, 2020

In this story, Balanced Two-Stage Residual Networks for Image Super-Resolution (BT-SRN), University of Illinois, and Texas A&M University, is briefly presented. In this paper:

A balanced two-stage structure, together with our lightweight two-layer PConv residual block design are proposed, to achieve very promising results when considering both accuracy and speed.
On the New Trends in Image Restoration and Enhancement workshop and challenge on image super-resolution (NTIRE SR 2017), the final model with only 10 residual blocks ranked among the best ones in terms of not only accuracy (6th among 20 final teams) but also speed (2nd among top 6 teams in terms of accuracy).

This is a paper in 2017 CVPRW with 30 citations. (Sik-Ho Tsang @ Medium)

Outline

BT-SRN: Network Architecture
Ablation Study
SOTA Comparison

1. BT-SRN: Network Architecture

BT-SRN, as shown above, mainly contains two stages: low resolution (LR) stage and high resolution (HR) stage.
In the low and high resolution stages, the residual networks are deployed with 6 and 4 residual blocks respectively.
The two stages are connected with up-sampling layers (yellow).
MSE is used as loss function.

Compared with VDSR, BT-SRN takes low resolution image as input and reduces the computational redundancy.
Compared with ESPCN, SRGAN and EnhanceNet, BT-SRN performs better refinement in the high resolution space and yield fewer artifacts.

1.1. Upsampling (Yellow)

For the up-sampling layers, the element sum of nearest neighbor up-sampling and deconvolution is employed.
The skip connections in up-sampling layers are achieved by nearest neighbor up-sampled feature maps.

1.2. Residual Blocks

Multiple settings of the residual blocks were investigated as above, including residual blocks in PixelCNN, gated convolution blocks in advanced PixelCNN, gated convolution blocks in PixelCNN++, and the proposed projected convolution (PConv) structure.
(a) PixelCNN: There are 1×1, 3×3, and 1×1 convolutions at the main branch.
(b) Gated Conv in PixelCNN: The gated convolutional structure is proposed in PixelCNN for better performance. After the first 3×3 convolutional layer, channels are divided into two branches. Hyperbolic tangent and sigmoid operations are applied to the feature map respectively and appended with element-wise multiplication.
(c) Gated Conv in PixelCNN++: The hyperbolic tangent branch in (b) is replaced by identity mapping.
(d) The Proposed PConv Block: A simple and efficient residual block structure called projected convolution (PConv) that has 1×1 convolution as feature map projection to reduce input size of 3×3 convolution. The proposed model achieves good trade-off between the accuracy and the speed.

1.3. Batch Normalization

Batch normalization is not suitable for super-resolution task because super-resolution is a regressing task, the target outputs are highly correlated to inputs first order statistics. Thus, it is not in use.

2. Ablation Study

**Number of Residual Blocks in HR stage**

By fixing the low resolution stages to 6 blocks, the results show that 4 blocks in high resolution stage is adequate for image refinement.

**Number of Residual Blocks in LR stage**

By fixing the high resolution stages to 4 blocks, the results show that the more blocks (less than 10) in low resolution, the better the performance is. And networks with 6 low resolution blocks are good compromise between accuracy and speed.

**PSNR Against Epochs With Different Number of Residual Blocks at LR and HR Stages**

By fixing the total number of blocks in low and high resolution stages to 10, the results show that networks, with 7 and 3 layers for low and high resolution stages respectively, achieve the best performance.

Compared to residual block in PixelCNN, the proposed PConv blocks achieve better performance, need the same time and less memory for training.
Although gated convolution in PixelCNN++ has better performance, it needs nearly double time and memory for training compared to proposed residual blocks.

3. SOTA Comparison

3.1. NTIRE 2017 Super Resolution Challenge

**Final Results in NTIRE 2017 Super Resolution Challenge**

BTSRN, with 6 and 4 residual blocks in low and high resolution stage respectively, is used. The proposed PConv blocks are employed with 128 nodes as input and 64 nodes after 1×1 convolution layer.
The networks are trained with training and validation dataset, totally 900 images, and evaluated on the 100-image testing set.
BTSRN has improve a lot compared to Bicubic.