Reading: MSRN — Multi-Scale Residual Network (Super Resolution)

On Par With EDSR But Much Fewer Number of Parameters, Outperforms LapSRN, DRCN, VDSR, ESPCN, FSRCNN, SRCNN

5 min readJul 22, 2020

In this story, Multi-scale Residual Network for Image Super-Resolution (MSRN), by East China Normal University, Jiangxi Normal University, is presented. In this paper:

A novel Multi-Scale Residual Network (MSRN) to fully exploit the image features.
Multi-Scale Residual Block (MSRB) is designed to convolution kernels of different sizes to adaptively detect the image features in different scales.
Furthermore, the outputs of each MSRB are used as the hierarchical features for global feature fusion.

This is a paper in 2018 ECCV with over 90 citations. (Sik-Ho Tsang @ Medium)

Outline

MSRN: Network Architecture
Multi-Scale Residual Block (MSRB)
Hierarchical Feature Fusion Structure (HFFS)
Image Reconstruction
SOTA Comparison
Further Study

1. MSRN: Network Architecture

MSRN can be divided into two parts: the feature extraction module and the image reconstruction module.
The feature extraction module is composed of two structures: multi-scale residual block (MSRB) and hierarchical feature fusion structure (HFFS).
The image reconstruction module uses PixelShuffle in ESPCN to upsample.

2. Multi-Scale Residual Block (MSRB)

MSRB has two parts: multi-scale features fusion and local residual learning.

2.1. Multi-Scale Features Fusion

A two-bypass network is constructed and different bypass use different convolutional kernel.
As shown above, the information between those bypass can be shared with each other so that able to detect the image features at different scales.
The input and output of the first convolutional layer have M feature maps.
And the second convolutional layer has 2M feature maps. All of these feature maps are concatenated and sent to a 1×1 convolutional layer. This layer reduces the number of these feature maps to M.

2.2. Local Residual Learning

Residual learning is adopted to each MSRB.

3. Hierarchical Feature Fusion Structure (HFFS)

All the outputs of the MSRBs are sent to the end of the network for reconstruction.
To fuse all outputs of MSRB, if skip connection is used, too much redundant information is generated for aimlessness.
Also, with the growth of depth, the spatial expression ability of the network gradually decreases while the semantic expression ability gradually increases.
A bottleneck layer is introduced which is essential for a convolutional layer with 1×1 kernel.

where M0 is the output of the first convolutional layer, Mi(i ≠ 0) represents the output of the i-th MSRB, and [M0, M1, M2, …, MN] denotes the concatenation operation.

4. Image Reconstruction

**Different Image Reconstruction Modules**

A, B, C: They use PixelShuffle in ESPCN or deconvolutional layer to upsample the LR image.
MSRN (Ours): The module, though PixelShuffle in ESPCN is also used, but simple and neat that it can be migrated to any upscaling factor with minor adjustments. For different upscaling factors, only the value of M is changed, whose change is negligible.
The detail is as shown below:

**Detailed configuration information about the reconstruction structure.**

5. SOTA Comparison

DIV2K is used for training.
The training data is augmented in three ways: (1) scaling (2) rotation (3) flipping.
In each training batch, we randomly extract 16 LR patches with the size of 64×64 and an epoch having 1000 iterations of back-propagation.
The final model has 8 multi-scale residual blocks (MSRB, N = 8) and the output of each MSRB has 64 feature maps.

**Quantitative comparisons of state-of-the-art methods. Red text indicates the best performancen and blue text indicate the second best performance.**

MSRN outperforms SOTA approaches such as SRCNN, ESPCN, FSRCNN, VDSR, DRCN, and LapSRN, by a large margin on different upscaling factors and test-datasets, but are slightly lower than EDSR.

But EDSR model is much large as shown above.

MSRN can reconstruct realistic images with sharp edges.

6. Further Study

6.1. Benefit of MSRB

**Quantitative comparison of three different feature extraction blocks**

MSRB module is superior to other modules, i.e. residual block in ResNet, and dense block in DenseNet, at all upsampling factors.

For residual block, and dense block, the activations are sparse (most values are zero, as the visualization shown in black) and some activation maps may be all zero which indicates dead filters.
MSRB contains more valid activation maps, which further proves the effectiveness of the structure.

6.2. Benefit of Increasing The Number of MSRB

**Performance comparison of MSRN with different number of MSRBs.**

MSRN performance improves rapidly with the number of MSRBs growth.
Finally 8 MSRBs are used, the result is close to EDSR, but the number of model parameters is only one-seventh of it.