Review: FSRCNN (Super Resolution)

Sik-Ho Tsang
Towards Data Science
6 min readOct 27, 2018

--

This time, FSRCNN, by CUHK, is reviewed. In this paper, a real-time super resolution approach is proposed. Fast Super-Resolution Convolutional Neural Network (FSRCNN) has been published in 2016 ECCV with nearly 300 citations when I was writing this story. (Sik-Ho Tsang @ Medium)

FSRCNN has a relatively shallow network which makes us easier to learn about the effect of each component. It is even faster with better reconstructed image quality than the previous SRCNN as the figure below.

From SRCNN to FSRCNN

By comparing SRCNN and FSRCNN-s, FSRCNN-s (a small model size version of FSRCNN) has a better PSNR (image quality) and much shorter running time, in which 43.5 fps is obtained.

By comparing SRCNN-Ex (A better SRCNN) and FSRCNN, FSRCNN has a better PSNR (image quality) and much shorter running time, in which 16.4 fps is obtained.

So, let’s see how it can achieve this.

What Are Covered

  1. Brief Review of SRCNN
  2. FSRCNN Network Architecture
  3. Explanation of 1×1 Convolution Used in Shrinking and Expanding
  4. Explanation of Multiple 3×3 Convolutions in Non-Linear Mapping
  5. Ablation Study
  6. Results
Network Architecture: SRCNN (Top) and FSRCNN (Bottom)

The above figure shows the network architectures of SRCNN and FSRCNN. In the figure, Conv(f,n,c) means the convolution with f×f filter size of n number of filters and c number of input channels.

1. Brief Review of SRCNN

In SRCNN, the steps are as follows:

  1. Bicubic interpolation is done first to upsample to the desired resolution.
  2. Then 9×9, 1×1, 5×5 convolutions are performed to improve the image quality. For the 1×1 conv, it was claimed to be used for non-linear mapping of the low-resolution (LR) image vector and the high-resolution (HR) image vector.

The computation complexity is:

where it is linearly proportional to the size of HR image, SHR. The larger the HR image, the higher the complexity.

2. FSRCNN Network Architecture

In FSRCNN, 5 main steps as in the figure with more convolutions are involved:

  1. Feature Extraction: Bicubic interpolation in previous SRCNN is replaced by 5×5 conv.
  2. Shrinking: 1×1 conv is done to reduce the number of feature maps from d to s where s<<d.
  3. Non-Linear Mapping: Multiple 3×3 layers are to replace a single wide one
  4. Expanding: 1×1 conv is done to increase the number of feature maps from s to d.
  5. Deconvolution: 9×9 filters are used to reconstruct the HR image.

The overall structure above is called FSRCNN(d,s,m). And the computational complexity is:

where it is linearly proportional to the size of LR image, SLR, which is much lower than that of SRCNN.

PReLU is used as activation function. PReLU is the one with parametric leaky ReLU, which is claimed as better than ReLU. (If interested, please also read my PReLU review.)

Cost function is just the standard mean square error (MSE):

3. Explanation of 1×1 Convolution Used in Shrinking and Expanding

Suppose we need to perform 5×5 convolution without the use of 1×1 convolution as below:

Without the Use of 1×1 Convolution

Number of operations = (14×14×48)×(5×5×480) = 112.9M

With the use of 1×1 convolution:

With the Use of 1×1 Convolution

Number of operations for 1×1 = (14×14×16)×(1×1×480) = 1.5M
Number of operations for 5×5 = (14×14×48)×(5×5×16) = 3.8M

Total number of operations = 1.5M + 3.8M = 5.3M
which is much much smaller than 112.9M !!!!!!!!!!!!!!!

Network-In-Network (NIN) suggested 1×1 conv introduces more non-linearity and improves the performance while GoogLeNet suggested 1×1 conv helps to reduce the model size while maintaining the performance. (If interested, please read my GoogLeNet review.)

Thus, 1×1 is used in between two convolutions to reduce the number of connections (parameters). By reducing the parameters, we only need fewer multiplication and addition operations, and finally speed up the network. That’s why FSRCNN is faster than SRCNN.

4. Explanation of Multiple 3×3 Convolutions in Non-Linear Mapping

2 layers of 3×3 filters already covered the 5×5 area

By using 2 layers of 3×3 filters, it actually have already covered 5×5 area with fewer number of parameters as in the above figure.

By using 1 layer of 5×5 filter, number of parameters = 5×5=25
By using 2 layers of 3×3 filters, number of parameters = 3×3+3×3=18
Number of parameters is reduced by 28%

With fewer parameters to be learnt, it is better for faster convergence, and reduced overfitting problem.

This problem has been addressed in VGGNet. (If interested, please read my VGGNet review.)

5. Ablation Study

Ablation Study of Each Step
  • SRCNN-Ex: A better version of SRCNN, with 57184 parameters.
  • Transition State 1: Deconv is used, with 58976 parameters, higher PSNR is obtained.
  • Transition State 2: More convs are used at the middle, with 17088 parameters, even higher PSNR is obtained.
  • FSRCNN (56,12, 4): Smaller filter sizes and fewer filter number, with 12464 parameters, even higher PSNR is obtained. The improvement is due to fewer parameters to be trained, easier to converge.

This shows there is contribution for each component.

Study of m, s, d

With higher m (m=4), higher PSNR.

With m=4, d=56, s=12, it has the better tradeoff between the HR image quality (33.16dB) and the model complexity (12464 parameters).

Finally, we got FSRCNN: FSRCNN (56,12,4).
And a smaller version, FSRCNN-s: FSRCNN (32,5,1).

6. Results

  • Train the network from scratch using 91-image dataset under the upscaling factor 3, then fine-tune the deconvolutional layer only by adding the General-100 dataset under the upscaling factor 2 and 4.
  • Data augmentation with scaling: 0.9, 0.8, 0.7, 0.6, and rotation of 90, 180, 270-degree.
All trained on 91-image dataset.
FSRCNN and FSRCNN-s are trained on 91-image and general-100 dataset.

From the results above, FSRCNN and FSRCNN-s work well for upscaling factors 2 and 3. But for upscaling factor 4, FSRCNN and FSRCNN-s are slightly worse than SCN.

Lenna image with upscaling factor 3
Butterfly image with upscaling factor 3

From above figures, we can see that FSRCNN has much clearer image.

In this paper, with such shallow network, we can know much about the effect of each component or technique, such as 1×1 convolution and multiple 3×3 convolutions.

References

  1. [2016 ECCV] [FSRCNN]
    Accelerating the Super-Resolution Convolutional Neural Network

My Reviews

[SRCNN] [PReLU-Net] [GoogLeNet] [VGGNet]

--

--