Reading: SRFBN — Super-Resolution Feedback Network (Super Resolution)

Outperforms DBPN & D-DBPN, RDN, SRMD & SRMDNF, EDSR, IRCNN, MemNet, DRRN, VDSR & SRCNN

Sik-Ho Tsang
8 min readJul 19, 2020
The Principle of SRFBN

In this paper, Feedback Network for Image Super-Resolution (SRFBN), by Sichuan University, University of California, University of British Columbia, and Incheon National University, is presented. In this paper:

  • A feedback block is designed to handle the feedback connections and to generate powerful high-level representations.
  • The proposed SRFBN comes with a strong early reconstruction ability and can create the final high-resolution image step by step.
  • In addition, a curriculum learning strategy is introduced to make the network well suitable for more complicated tasks, where the low-resolution images are corrupted by multiple types of degradation.

The principle of the feedback scheme is that the information of a coarse SR image can facilitate an LR image to reconstruct a better SR image.

This is a paper in 2019 CVPR with about 100 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. SRFBN: Network Architecture
  2. Feedback Block (FB)
  3. Curriculum Learning Strategy
  4. Ablation Study
  5. SOTA Comparison

1. SRFBN: Network Architecture

SRFBN: Network Architecture
  • The sub-network placed in each iteration t contains three parts: an LR feature extraction block (LRFB), a feedback block (FB) and a reconstruction block (RB).
  • The weights of each block are shared across time.
  • Conv(s, n) and Deconv(s, n) as a convolutional layer and a deconvolutional layer respectively, where s is the size of the filter and n is the number of filters.

1.1. LRFB

  • The LR feature extraction block, (LRFB), consists of Conv(3, 4m) and Conv(1, m). m denotes the base number of filters:

1.2. FB

  • The feedback block (FB) at the t-th iteration receives the hidden state from previous iteration Ft-1 out through a feedback connection and shallow features Ftin. Ftout represents the output of the FB:

1.3. RB

  • The reconstruction block (RB) uses Deconv(k, m) to upscale LR features Ftout to HR ones and Conv(3, cout) to generate a residual image ItRes, where cout is 1 or 3 which is the number of channels of image:
  • The output image ItSR at the t-th iteration can be obtained by:
  • where fUP denotes the operation of an upsample kernel. The choice of the upsample kernel is arbitrary. A bilinear upsample kernel is used here.
  • After T iterations, in total T SR images are obtained (I1SR, I2SR, …, ITSR).

1.4. Other Details

  • PReLU is used.
  • Various k are used in Conv(k, m) and Deconv(k, m) for different scale factors to perform up- and down-sampling operations.
  • For ×2, k=6. For ×3, k=7. For ×4, k=8.
  • Input patch sizes are different for different scale factor as shown above.
  • 200 epochs are trained with batch size of 16.

2. Feedback Block (FB)

Feedback Block (FB)
  • The FB contains G projection groups sequentially with dense skip connections among them. Each projection group, which can project HR features to LR ones, mainly includes an upsample operation and a downsample operation.
  • At the beginning of the FB, Ftin and Ft-1out are concatenated and compressed by Conv(1, m) to refine input features Ftin by feedback information Ft-1out , producing the refined input features Lt0:
  • where C0 refers to the initial compression operation and [] is the concatenation operation.
  • Then, upsample operation is:
  • where Cg is the Deconv(k, m) at the g-th projection group.
  • And downsample operation is:
  • where Cg is the Conv(k, m) at the g-th projection group.
  • Except for the first projection group, Conv(1, m) is added before C↑g and Cg for parameter and computation efficiency.
  • The feature fusion (green arrows in the figure) for LR features generated by projection groups to generate the output of FB:
  • where CFF represents the function of Conv(1, m).

3. Curriculum Learning Strategy

  • For single degradation model, all T target SR images are the same.
  • For complex degradation model, the target SR images are ordered based on the difficulty of the task from easy to hard to enforce a curriculum.
  • L1 loss is used.
  • Each output has equal contribution.

4. Ablation Study

4.1. Settings

  • DIV2K and Flickr2K are used as the training data.
  • BI: Bicubic downsampling.
  • BD: Gaussian blur followed by downsampling to HR images. 7×7 sized Gaussian kernel with standard deviation 1.6 for blurring.
  • DN: Bicubic downsampling followed by adding Gaussian noise, with noise level of 30.

4.2. Study of T and G

Study of T and G
  • The number of iterations is denoted as T and the number of projection groups in the feedback block is denoted as G.
  • The base number of filters m is set to 32.
  • (a): By fixing G to 6, the reconstruction performance is significantly improved compared with the network without feedback connections (T=1).
  • As T continues to increase, the reconstruction quality keeps rising.
  • (b): By fixing T to 4, larger G leads to higher accuracy due to stronger representative ability of deeper networks.

4.3. Feedback vs Feedforward

The impact of feedback on Set5 with scale factor ×4
  • SRFBN-L (T=4, G=6) is used for analysis.
  • By simply disconnecting the loss to all iterations except the last one, the network is thus impossible to reroute a notion of output to low-level representations and is then degenerated to a feedforward one (however still retains its recurrent property), denoted as SRFBN-L-FF.
  • SRFBN-L and SRFBN-LFF both have four iterations with four HR outputs.
  • SRFBN-L outperforms SRFBN-L-FF at every iteration, from which we conclude that the feedback network is capable of producing high quality early predictions in contrast to feedforward network.

4.4. Curriculum Learning

The investigation of curriculum learning (CL) on BD and DN degradation models with scale factor ×4. The average PSNR values are evaluated on Set5
  • Empirically blurred HR images are provided at first two iterations and original HR images are provided at remaining two iterations for experiments with the BD degradation model.
  • For experiments with the DN degradation model, noisy HR images are used at first two iterations and HR images without noise are used at last two iterations.
  • As shown above, the curriculum learning strategy well assists the proposed SRFBN in handling BD and DN degradation models under both circumstances.
  • Also, fine-tuning on a network pretrained on the BI degradation model leads to higher PSNR values than training from scratch.

5. SOTA Comparison

5.1. Network Parameters

PSNR Against network parameters
  • The SRFBN with a larger base number of filters (m=64), which is derived from the SRFBN-L, is implemented for comparison.
  • A self-ensemble method is also used to further improve the performance of the SRFBN, is denoted as SRFBN+.
  • A lightweight network SRFBN-S (T=4, G=3, m=32) is also used.
  • SRFBN-S can achieve the best SR results among the networks with parameters fewer than 1000K, such as DBPN-S, MemNet, DRRN, VDSR and SRCNN.
  • In comparison with the networks with a large number of parameters, such as D-DBPN and EDSR, the proposed SRFBN and SRFBN+ can achieve competitive results, while only needs the 35% and 7% parameters of D-DBPN and EDSR, respectively.

5.2. Results with BI degradation model

Average PSNR/SSIM values for scale factors ×2, ×3 and ×4 with BI degradation model. The best performance is shown in red and the second best performance is shown in blue.
  • SRFBN can outperform almost all comparative methods.
  • Compared with SRFBN, EDSR utilizes much more number of filters (256 vs. 64), and D-DBPN employs more training images (DIV2K + Flickr2K+ ImageNet vs. DIV2K + Flickr2K). However, SRFBN can earn competitive results in contrast to them.
  • In addition, SRFBN+ outperforms almost all comparative methods, such as D-DBPN, EDSR, MemNet, DRRN, VDSR and SRCNN.
Visual results of BI degradation model with scale factor ×4.
  • For the upper image, the texture direction of the SR images from all comparative methods is wrong. However, SRFBN makes full use of the high-level information to take a self-correcting process, thus a more faithful SR image can be obtained.
  • For the lower one, DRRN and MemNet even split the ‘M’ letter. VDSR, EDSR and D-DBPN fail to recover the clear image. The proposed SRFBN produces a clear image which is very close to the ground truth.

5.3. Results with BD and DN degradation model

Average PSNR/SSIM values for scale factors ×2, ×3 and ×4 with BD and DN degradation model. The best performance is shown in red and the second best performance is shown in blue.
  • SRFBN is trained using curriculum learning strategy for BD and DN degradation models, and fine-tuned based on BI degradation model using DIV2K.
  • SRFBN and SRFBN+ achieve the best on almost all quantitative results over other state-of-the-art methods, such as RDN, SRMD & SRMDNF, IRCNN, MemNet, VDSR and SRCNN.
Visual results of BD and DN degradation model with scale factor ×3.
  • The first set of images shows the results obtained from BD degradation model. The second set of images shows the results from DN degradation model.
  • SRFBN could alleviate the distortions and generate more accurate details in SR images.

This is the 16th story in this month.

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet