Reading: PRLSR — Progressive Residual Learning for SISR (Super Resolution)

Low-Complexity Model, Outperforms OISR, SRFBN, CARN, SRMDNF, MemNet, LapSRN, DRCN, DRRN, VDSR & SRCNN

Trade-off comparison of the performances and the number of operations

In this story, A Fast and Accurate Super-Resolution Network Using Progressive Residual Learning (PRLSR), by Peking University, Shenzhen Graduate School, and Media Lab, Tencent, is presented. In this paper:

  • A progressive residual block (PRB) is designed to progressively downsample deep features for reducing the redundancy and obtaining refined features.
  • A high-frequency preserving module is proposed to lower the detail loss caused by resolution reduction in PRB.
  • A residual learning-based architecture with learnable weights is utilized to extract multilevel features and adaptively adjust the contribution of residual mapping and identity mapping in residual structure to accelerate convergence.

This is a paper in 2020 ICASSP. (Sik-Ho Tsang @ Medium)

Outline

  1. PRLSR: Network Architecture
  2. Progressive Residual Block (PRB)
  3. Residual Learning-Based Architecture (RLA)
  4. High-Frequency Preserving (HFP) Module
  5. Experimental Results

1. PRLSR: Network Architecture

PRLSR: Network Architecture
  • PRLSR is made up of three parts, i.e., shallow feature extraction (the first layer), deep feature extraction, and reconstruction.
  • Shallow feature extraction: a convolution layer with kernel size 3×3 to obtain the shallow feature.
  • Deep feature extraction: consists of several PRBs and a fusion layer, which will be mentioned in the coming sections.
  • Reconstruction module: a global residual path fup is designed by stacking a PixelShuffle layer, originated in ESPCN, and a convolution layer with the output from shallow feature extraction as input.

2. Progressive Residual Block (PRB)

Progressive Residual Block (PRB)
  • PRB has a U-Net structure that consists of 5 RLAs, 2 HFPs, and 2 fusion layers.
  • These RLAs progressively learn the SR information from LR features.
  • Spatial resolution of the feature map is decreased by half two times through an average-pooling layer in the first two RLAs and then increased twice two times via bilinear interpolation in the last two RLAs.
  • To prevent losing the high-frequency information (HFI) caused by downsampling, HFP is designed to retain the HFI before resolution reduction.

3. Residual Learning-Based Architecture (RLA)

Residual Learning-Based Architecture (RLA)
  • An RL-based Architecture (RLA) is proposed as the basic feature extraction module in PRB for deepening the model.
  • RLA contains three residual units and a fusion layer.
  • A residual scaling with learnable weights (RSL) is designed to dynamically adjust the importance of identity mapping and residual mapping:
  • where f and τ are the convolution layer and the activation function in the residual unit, respectively.
  • λires and λix are the learnable weight values of two branches.
  • rmi is the output of the i-th residual unit in m-th RLA.
  • To preserve certain spatial location information, a long skip connection and a fusion layer are utilized to make better use of the SR information of each residual unit in RLA.
  • In particular, the outputs of all residual units are concatenated together followed by the fusion layer to fully utilize multiple levels of features and match the dimensions of the input.
  • The whole RLA is formulated as:
  • where λf and λL are the learnable weight values of the fusion layer and the long skip connection.

4. High-Frequency Preserving (HFP) Module

High-Frequency Preserving (HFP) Module
  • The proposed high-frequency preserving module (HFP) retains the HFI before the size of feature maps reduced.
  • Assuming that the input feature map TL has size C×H×W which denotes the channel number C, the height H, and the width W, respectively, an average-pooling layer is applied to TL:
  • where k and s denote the kernel size and stride, respectively. And the k is set equal to s.
  • Then, TS is upsampled to get a C×H×W tensor TU by using a scale parameter which is equal to k.
  • TU can be regarded as an expression of the average smoothness information compared with the original TL.
  • where ξ is high-frequency information (HFI).

5. Experimental Results

5.1. Settings

  • 32 channels are used for each convolution layer and ReLU is used as the activation function in RLA.
  • k is set to 2.
  • The PRLSR has 3 PRBs by default and 115 layers consequently.
  • Batch normalization is not used.
  • DIV2K 800 training images are used for training.
  • L1 loss is used.
  • 16 patches of size 48×48 from the LR images are randomly cropped as input

5.2. Model Analysis

PSNR results are all evaluated on Set5 with scaling factor ×2.
  • PRB dramatically reduces the computational cost (187.3G v.s. 337.7G) with little degradation of PSNR (0.02dB) compared with PRB w/o downsampling.
  • HFP retains the HFI when the size of feature maps reduced and brings about 0.08dB PSNR improvements.
  • With RSL, our PRLSR acquires 0.07dB improvements.
  • When combining RSL and HFP, PRLSR obtains the highest PSNR. To be specific, the gain over the baseline is 0.13dB.

5.3. Comparison with State-of-the-arts

Comparison of input and the number of layers between lightweight SR models.
Quantitative results of deep learning-based SR algorithms.
  • PRLSR achieves the best performance on each scale, surpassing the second-best 0.1375dB on scale ×2, 0.0625dB on scale ×3, and 0.08dB on scale ×4 by average on four datasets, outperforms OISR, SRFBN, CARN, SRMDNF, LapSRN, DRRN, VDSR & SRCNN.
  • The main reason is that PRLSR uses PRB with RLA to extract features and HFP to retain the HFI so as to achieve very deep layers (i.e., 115 layers) compared with other methods (52 layers at most).
  • For example, With SR image size as 1280×720:
  • Compared to LapSRN with fewer parameters and Mult-Adds, PRLSR performs better with a large margin in PSNR and SSIM.
  • Compared with OISR-RK2-s which has similar performance, PRLSR has fewer parameters and Mult-Adds (nearly one-half).
  • Especially on scale ×4, the proposed PRLSR’s Mult-Adds is close to the SRCNN which has only three layers.
Visual qualitative comparison on ×4 scale dataset
  • The detail patches from two images are selected and it can be seen that our PRLSR performs better in the high-frequency details compared with state-of-the-art methods.

This is the 23rd story in this month.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store