Reading: SRFBN — Super-Resolution Feedback Network (Super Resolution)
Outperforms DBPN & D-DBPN, RDN, SRMD & SRMDNF, EDSR, IRCNN, MemNet, DRRN, VDSR & SRCNN
In this paper, Feedback Network for Image Super-Resolution (SRFBN), by Sichuan University, University of California, University of British Columbia, and Incheon National University, is presented. In this paper:
- A feedback block is designed to handle the feedback connections and to generate powerful high-level representations.
- The proposed SRFBN comes with a strong early reconstruction ability and can create the final high-resolution image step by step.
- In addition, a curriculum learning strategy is introduced to make the network well suitable for more complicated tasks, where the low-resolution images are corrupted by multiple types of degradation.
The principle of the feedback scheme is that the information of a coarse SR image can facilitate an LR image to reconstruct a better SR image.
This is a paper in 2019 CVPR with about 100 citations. (Sik-Ho Tsang @ Medium)
Outline
- SRFBN: Network Architecture
- Feedback Block (FB)
- Curriculum Learning Strategy
- Ablation Study
- SOTA Comparison
1. SRFBN: Network Architecture
- The sub-network placed in each iteration t contains three parts: an LR feature extraction block (LRFB), a feedback block (FB) and a reconstruction block (RB).
- The weights of each block are shared across time.
- Conv(s, n) and Deconv(s, n) as a convolutional layer and a deconvolutional layer respectively, where s is the size of the filter and n is the number of filters.
1.1. LRFB
- The LR feature extraction block, (LRFB), consists of Conv(3, 4m) and Conv(1, m). m denotes the base number of filters:
1.2. FB
- The feedback block (FB) at the t-th iteration receives the hidden state from previous iteration Ft-1 out through a feedback connection and shallow features Ftin. Ftout represents the output of the FB:
1.3. RB
- The reconstruction block (RB) uses Deconv(k, m) to upscale LR features Ftout to HR ones and Conv(3, cout) to generate a residual image ItRes, where cout is 1 or 3 which is the number of channels of image:
- The output image ItSR at the t-th iteration can be obtained by:
- where fUP denotes the operation of an upsample kernel. The choice of the upsample kernel is arbitrary. A bilinear upsample kernel is used here.
- After T iterations, in total T SR images are obtained (I1SR, I2SR, …, ITSR).
1.4. Other Details
- PReLU is used.
- Various k are used in Conv(k, m) and Deconv(k, m) for different scale factors to perform up- and down-sampling operations.
- For ×2, k=6. For ×3, k=7. For ×4, k=8.
- Input patch sizes are different for different scale factor as shown above.
- 200 epochs are trained with batch size of 16.
2. Feedback Block (FB)
- The FB contains G projection groups sequentially with dense skip connections among them. Each projection group, which can project HR features to LR ones, mainly includes an upsample operation and a downsample operation.
- At the beginning of the FB, Ftin and Ft-1out are concatenated and compressed by Conv(1, m) to refine input features Ftin by feedback information Ft-1out , producing the refined input features Lt0:
- where C0 refers to the initial compression operation and [] is the concatenation operation.
- Then, upsample operation is:
- where C↑g is the Deconv(k, m) at the g-th projection group.
- And downsample operation is:
- where C↓g is the Conv(k, m) at the g-th projection group.
- Except for the first projection group, Conv(1, m) is added before C↑g and C↓g for parameter and computation efficiency.
- The feature fusion (green arrows in the figure) for LR features generated by projection groups to generate the output of FB:
- where CFF represents the function of Conv(1, m).
3. Curriculum Learning Strategy
- For single degradation model, all T target SR images are the same.
- For complex degradation model, the target SR images are ordered based on the difficulty of the task from easy to hard to enforce a curriculum.
- L1 loss is used.
- Each output has equal contribution.
4. Ablation Study
4.1. Settings
- DIV2K and Flickr2K are used as the training data.
- BI: Bicubic downsampling.
- BD: Gaussian blur followed by downsampling to HR images. 7×7 sized Gaussian kernel with standard deviation 1.6 for blurring.
- DN: Bicubic downsampling followed by adding Gaussian noise, with noise level of 30.
4.2. Study of T and G
- The number of iterations is denoted as T and the number of projection groups in the feedback block is denoted as G.
- The base number of filters m is set to 32.
- (a): By fixing G to 6, the reconstruction performance is significantly improved compared with the network without feedback connections (T=1).
- As T continues to increase, the reconstruction quality keeps rising.
- (b): By fixing T to 4, larger G leads to higher accuracy due to stronger representative ability of deeper networks.
4.3. Feedback vs Feedforward
- SRFBN-L (T=4, G=6) is used for analysis.
- By simply disconnecting the loss to all iterations except the last one, the network is thus impossible to reroute a notion of output to low-level representations and is then degenerated to a feedforward one (however still retains its recurrent property), denoted as SRFBN-L-FF.
- SRFBN-L and SRFBN-LFF both have four iterations with four HR outputs.
- SRFBN-L outperforms SRFBN-L-FF at every iteration, from which we conclude that the feedback network is capable of producing high quality early predictions in contrast to feedforward network.
4.4. Curriculum Learning
- Empirically blurred HR images are provided at first two iterations and original HR images are provided at remaining two iterations for experiments with the BD degradation model.
- For experiments with the DN degradation model, noisy HR images are used at first two iterations and HR images without noise are used at last two iterations.
- As shown above, the curriculum learning strategy well assists the proposed SRFBN in handling BD and DN degradation models under both circumstances.
- Also, fine-tuning on a network pretrained on the BI degradation model leads to higher PSNR values than training from scratch.
5. SOTA Comparison
5.1. Network Parameters
- The SRFBN with a larger base number of filters (m=64), which is derived from the SRFBN-L, is implemented for comparison.
- A self-ensemble method is also used to further improve the performance of the SRFBN, is denoted as SRFBN+.
- A lightweight network SRFBN-S (T=4, G=3, m=32) is also used.
- SRFBN-S can achieve the best SR results among the networks with parameters fewer than 1000K, such as DBPN-S, MemNet, DRRN, VDSR and SRCNN.
- In comparison with the networks with a large number of parameters, such as D-DBPN and EDSR, the proposed SRFBN and SRFBN+ can achieve competitive results, while only needs the 35% and 7% parameters of D-DBPN and EDSR, respectively.
5.2. Results with BI degradation model
- SRFBN can outperform almost all comparative methods.
- Compared with SRFBN, EDSR utilizes much more number of filters (256 vs. 64), and D-DBPN employs more training images (DIV2K + Flickr2K+ ImageNet vs. DIV2K + Flickr2K). However, SRFBN can earn competitive results in contrast to them.
- In addition, SRFBN+ outperforms almost all comparative methods, such as D-DBPN, EDSR, MemNet, DRRN, VDSR and SRCNN.
- For the upper image, the texture direction of the SR images from all comparative methods is wrong. However, SRFBN makes full use of the high-level information to take a self-correcting process, thus a more faithful SR image can be obtained.
- For the lower one, DRRN and MemNet even split the ‘M’ letter. VDSR, EDSR and D-DBPN fail to recover the clear image. The proposed SRFBN produces a clear image which is very close to the ground truth.
5.3. Results with BD and DN degradation model
- SRFBN is trained using curriculum learning strategy for BD and DN degradation models, and fine-tuned based on BI degradation model using DIV2K.
- SRFBN and SRFBN+ achieve the best on almost all quantitative results over other state-of-the-art methods, such as RDN, SRMD & SRMDNF, IRCNN, MemNet, VDSR and SRCNN.
- The first set of images shows the results obtained from BD degradation model. The second set of images shows the results from DN degradation model.
- SRFBN could alleviate the distortions and generate more accurate details in SR images.
This is the 16th story in this month.
Reference
[2019 CVPR] [SRFBN]
Feedback Network for Image Super-Resolution
Super Resolution
[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DnCNN] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [MemNet] [IRCNN] [WDRN / WavResNet] [MWCNN] [SRDenseNet] [SRGAN & SRResNet] [SelNet] [CNF] [EDSR & MDSR] [MDesNet] [RDN] [SRMD & SRMDNF] [DBPN & D-DBPN] [RCAN] [ESRGAN] [CARN] [IDN] [SR+STN] [SRFBN]