# Reading: SRFBN — Super-Resolution Feedback Network (Super Resolution)

## Outperforms DBPN & D-DBPN, RDN, SRMD & SRMDNF, EDSR, IRCNN, MemNet, DRRN, VDSR & SRCNN

In this paper, **Feedback Network for Image Super-Resolution (SRFBN)**, by Sichuan University, University of California, University of British Columbia, and Incheon National University, is presented. In this paper:

**A feedback block**is designed to**handle the feedback connections and to generate powerful high-level representations.**- The proposed SRFBN comes with a strong early reconstruction ability and can
**create the final high-resolution image step by step.** - In addition,
**a curriculum learning strategy**is introduced to make the network well suitable for more complicated tasks, where the low-resolution images are corrupted by multiple types of degradation.

The principle of the feedback scheme is that** the information of a coarse SR image can facilitate an LR image to reconstruct a better SR image.**

This is a paper in **2019 CVPR** with about **100 citations**. (Sik-Ho Tsang @ Medium)

# Outline

**SRFBN: Network Architecture****Feedback Block (FB)****Curriculum Learning Strategy****Ablation Study****SOTA Comparison**

**1. SRFBN: Network Architecture**

- The sub-network placed in each iteration
*t*contains**three parts**:**an LR feature extraction block (LRFB), a feedback block (FB) and a reconstruction block (RB).** - The weights of each block are shared across time.
- Conv(
*s*,*n*) and Deconv(*s,**n*) as a convolutional layer and a deconvolutional layer respectively, where*s*is the size of the filter and*n*is the number of filters.

## 1.1. LRFB

**The LR feature extraction block, (LRFB),**consists of Conv(3, 4*m*) and Conv(1,*m*).*m*denotes the base number of filters:

## 1.2. FB

**The feedback block (FB)**at the*t*-th iteration receives the hidden state from previous iteration*Ft*-1 out through a feedback connection and shallow features*Ftin*.*Ftout*represents the output of the FB:

## 1.3. RB

**The reconstruction block (RB)**uses Deconv(*k*,*m*) to upscale LR features*Ftout*to HR ones and Conv(3,*cout*) to generate a residual image*ItRes*, where*cout*is 1 or 3 which is the number of channels of image:

**The output image**can be obtained by:*ItSR*at the*t*-th iteration

- where
*fUP*denotes the operation of an upsample kernel. The choice of the upsample kernel is arbitrary. A bilinear upsample kernel is used here. - After
*T*iterations, in total*T*SR images are obtained (*I*1*SR*,*I*2*SR*, …,*ITSR*).

## 1.4. Other Details

- PReLU is used.
- Various
*k*are used in Conv(*k*,*m*) and Deconv(*k*,*m*) for different scale factors to perform up- and down-sampling operations. - For ×2,
*k*=6. For ×3,*k*=7. For ×4,*k*=8.

- Input patch sizes are different for different scale factor as shown above.
- 200 epochs are trained with batch size of 16.

# 2. **Feedback Block (FB)**

**The FB contains**, which can project HR features to LR ones, mainly includes an*G*projection groups sequentially with dense skip connections among them. Each projection group**upsample**operation and a**downsample**operation.- At the beginning of the FB,
to refine input features*Ftin*and*Ft*-1*out*are concatenated and compressed by Conv(1,*m*)*Ftin*by feedback information*Ft*-1*out*,**producing the refined input features**:*Lt*0

- where
*C*0 refers to the initial compression operation and [] is the concatenation operation. - Then,
**upsample**operation is:

- where
*C*↑*g*is the Deconv(*k*,*m*) at the*g*-th projection group. - And
**downsample**operation is:

- where
*C*↓*g*is the Conv(*k*,*m*) at the*g*-th projection group. - Except for the first projection group,
**Conv(1,***m*) is added before C↑g and*C*↓*g*for parameter and computation efficiency. **The feature fusion (green arrows in the figure) for LR features generated by projection groups to generate the output of FB:**

- where
*CFF*represents the function of Conv(1,*m*).

**3. Curriculum Learning Strategy**

- For
**single degradation**model,**all***T*target SR images are the same. - For
**complex degradation**model,**the target SR images are ordered based on the difficulty of the task from easy to hard**to enforce a curriculum. - L1 loss is used.
- Each output has equal contribution.

**4. Ablation Study**

## 4.1. Settings

- DIV2K and Flickr2K are used as the training data.
**BI**: Bicubic downsampling.**BD**: Gaussian blur followed by downsampling to HR images. 7×7 sized Gaussian kernel with standard deviation 1.6 for blurring.**DN**: Bicubic downsampling followed by adding Gaussian noise, with noise level of 30.

## 4.2. Study of T and G

- The number of iterations is denoted as
*T*and the number of projection groups in the feedback block is denoted as*G*. - The base number of filters
*m*is set to 32. **(a):**By fixing*G*to 6, the reconstruction performance is significantly improved compared with the network without feedback connections (*T*=1).**As T continues to increase, the reconstruction quality keeps rising.**- (b): By fixing
*T*to 4,**larger G leads to higher accuracy due to stronger representative ability**of deeper networks.

## 4.3. Feedback vs Feedforward

**SRFBN-L (**is used for analysis.*T*=4,*G*=6)- By simply
**disconnecting the loss to all iterations except the last one**, the network is thus impossible to reroute a notion of output to low-level representations and is then**degenerated to a feedforward one**(however still retains its recurrent property), denoted as**SRFBN-L-FF**. - SRFBN-L and SRFBN-LFF both have four iterations with four HR outputs.
- SRFBN-L outperforms SRFBN-L-FF at every iteration, from which we conclude that the feedback network is capable of producing high quality early predictions in contrast to feedforward network.

## 4.4. Curriculum Learning

- Empirically
**blurred HR images are provided at first two iterations**and**original HR images are provided at remaining two iterations**for experiments with the**BD**degradation model. - For experiments with the
**DN**degradation model,**noisy HR images are used at first two iterations**and**HR images without noise are used at last two iterations.** - As shown above
**, the curriculum learning strategy well assists the proposed SRFBN**in handling BD and DN degradation models under both circumstances. - Also,
**fine-tuning on a network pretrained on the BI degradation model leads to higher PSNR**values than training from scratch.

# 5. **SOTA Comparison**

## 5.1. Network Parameters

- The SRFBN with a larger base number of filters (
*m*=64), which is derived from the**SRFBN-L**, is implemented for comparison. - A self-ensemble method is also used to further improve the performance of the SRFBN, is denoted as
**SRFBN+**. - A lightweight network
**SRFBN-S**(*T*=4,*G*=3,*m*=32) is also used. **SRFBN-S can achieve the best SR results among the networks with parameters fewer than 1000K, such as**DBPN-S, MemNet, DRRN, VDSR and SRCNN**.**- In comparison with the networks with a large number of parameters, such as D-DBPN and EDSR, the proposed
**SRFBN and SRFBN+ can achieve competitive results, while only needs the 35% and 7% parameters of D-DBPN and EDSR, respectively.**

## 5.2. Results with BI degradation model

**SRFBN can outperform almost all comparative methods.**- Compared with SRFBN, EDSR utilizes much more number of filters (256 vs. 64), and D-DBPN employs more training images (DIV2K + Flickr2K+ ImageNet vs. DIV2K + Flickr2K). However, SRFBN can earn competitive results in contrast to them.
- In addition,
**SRFBN+ outperforms almost all comparative methods, such as****D-DBPN****,****EDSR****,****MemNet****,****DRRN****,****VDSR****and****SRCNN****.**

- For the upper image,
**the texture direction of the SR images from all comparative methods is wrong.**However,**SRFBN makes full use of the high-level information to take a self-correcting process**, thus a more faithful SR image can be obtained. - For the lower one, DRRN and MemNet even split the ‘
*M*’ letter. VDSR, EDSR and D-DBPN fail to recover the clear image. The proposed SRFBN produces a clear image which is very close to the ground truth.

## 5.3. Results with BD and DN degradation model

- SRFBN is trained using
**curriculum learning strategy for BD and DN**degradation models, and**fine-tuned based on BI degradation model**using DIV2K. **SRFBN and SRFBN+ achieve the best on almost all quantitative results**over other state-of-the-art methods, such as RDN, SRMD & SRMDNF, IRCNN, MemNet, VDSR and SRCNN.

- The first set of images shows the results obtained from BD degradation model. The second set of images shows the results from DN degradation model.
- SRFBN could alleviate the distortions and generate more accurate details in SR images.

This is the 16th story in this month.

## Reference

[2019 CVPR] [SRFBN]

Feedback Network for Image Super-Resolution

## Super Resolution

[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DnCNN] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [MemNet] [IRCNN] [WDRN / WavResNet] [MWCNN] [SRDenseNet] [SRGAN & SRResNet] [SelNet] [CNF] [EDSR & MDSR] [MDesNet] [RDN] [SRMD & SRMDNF] [DBPN & D-DBPN] [RCAN] [ESRGAN] [CARN] [IDN] [SR+STN] [SRFBN]