Review: WDRN / WavResNet — Wavelet-based Deep Residual Learning Network (Image Denoising & Super Resolution)

Wavelet Transformed Image As Input, Outperforms VDSR, DnCNN, Ranked Third in NTIRE Competition

Sik-Ho Tsang
5 min readApr 29, 2020

In this story, “Beyond Deep Residual Learning for Image Restoration: Persistent Homology-Guided Manifold Simplification”, by Korea Ad. Inst. of Science & Technology (KAIST), is briefly reviewed. The short form, WDRN, is given because it is cited using this name by another 2018 JEI paper. Another short form, WavResNet, is cited in another 2018 CVPRW paper. It is also used in another paper called “Wavelet Domain Residual Network (WavResNet) for Low-Dose X-ray CT Reconstruction” (The last author is the same), in which the network is very similar.

Instead of using the original image as input, the image is wavelet-transformed before inputting into CNN. Because of this, the complexity is low and the inference time is fast. And it is ranked 3rd in NTIRE competition which makes it become a paper in 2017 CVPRW with over 70 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Network Architecture
  2. Experimental Results

1. Network Architecture

  • There are 2 architectures: Denoising architecture, and NTIRE SISR competition architecture.

1.1. Denoising Architecture

The Four Patches as Input After Wavelet Transform
  • The input and the clean label images are first decomposed into four subbands (i.e. LL, LH, HL, and HH) using the wavelet transform.
  • The wavelet residual images, which are now used as the new labels, are obtained by the difference between the input and the clean label images in the wavelet domain.
  • Then, the network is trained to learn multi-input and multi-output functional relationship between these newly processed input and label.
  • Four patches at the same locations in each wavelet subband are extracted and used for training.
Network Architecture
  • The network consists of five modules between the first and the last stages.
  • Each module has one bypass connection, three convolution layers, three batch normalizations, and three ReLU layers.
  • The first stage contains two layers: one with a convolution layer with ReLU which is followed by the other convolution layer with batch normalization and ReLU.
  • The last stage is composed of three layers: two layers with a convolution, batch normalization, and ReLU and the last layer with a convolution layer.
  • The total number of convolution layers is 20. The convolution filter size is 3×3×320×320.
  • (Please read batch normalization from Inception-v2 and bypass connection from ResNet.)
  • Three advantages using wavelet transform:

1. The input feature space is mapped to another feature space which can be trained easier, which can help to reduce the network depth, i.e. reduce the computational complexity. Also, it is easier to be trained.

2. The patch size can be reduced by half. It can reduce the runtime of the network due to the size of the output images of layers being halved.

3. The minimum required size of receptive field can be reduced.

1.2. NTIRE SISR (Single Image Super Resolution) Competition Architecture

NTIRE SISR Competition Architecture
  • These architectures are extended from the primary denoising architecture. Depending on the decimation schemes (bicubic ×2, ×3, ×4, and unknown ×2, ×3, and ×4) for low resolution dataset, three different architectures are implemented.
  • All three SISR architectures have 41 convolution layers.
  • To reconstruct the bicubic ×2 downsampled dataset, two long bypass connections are used between six basic modules in the network and the number of channels were 256.
  • For the other datasets, we did not use the long bypass connection and the number of channels were 320.
Average PSNR/SSIM on 50 validation data of DIV2K dataset
  • With long pass, the average PSNR/SSIM is improved.

2. Experimental Results

2.1. Denoising

PSNR on “Set12” dataset in the Gaussian denoising task
SSIM on “Set12” dataset in the Gaussian denoising task
Average PSNR/SSIM for “BSD68” dataset in the Gaussian denoising task
  • The proposed network outperforms the state-of-the-art denoising methods such as DnCNN in terms of PSNR and SSIM for all Set12 images and BSD68 dataset.
Visual Quality

2.2. NTIRE SISR (Single Image Super Resolution) Competition

PSNR/SSIM for various datasets in SISR tasks (Proposed-P is trained using 291 dataset, and Proposed is trained using RGB of DIV2K dataset)
  • As shown above, the proposed network outperforms VDSR and DnCNN.
  • Compared to the 14–67 seconds computational time by the top ranked groups’ results, the proposed network’s computational time was only 4–5 seconds for each frame.
Performance comparison of SISR at scale factor of ×4 of bicubic downsampling (Left : input, Center : restoration result, Right : label.)
Performance comparison of SISR at scale factor of ×4 of unknown downsampling. (Left : input, Center : restoration result, Right : label.)

During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. And this is the 32nd story in this month. Thanks for visiting my story…

2 Days left for this month. How about 35 stories within this month…?

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet