Review: MDesNet — Multichannel Densely Convolutional Network (Super Resolution)

Image Transformed Before Inputting into DenseNet-Like Network, Outperforms VDSR, DRCN, LapSRN, DRRN, WDRN, MWCNN

Sik-Ho Tsang
5 min readApr 30, 2020

In this story, Multichannel Densely Convolutional Network (MDesNet), by Wuhan University, and Wuhan Electric Power Design Institute, is reviewed.

  • The proposed MDesNet first decomposes the input image into intrinsic mode functions (IMFs) and residue based on bidimensional empirical mode decomposition.
  • Thus, relatively shallow subnetwork can be applied so as to avoid vanishing gradient problem.

This is a paper in 2018 JEI. (Sik-Ho Tsang @ Medium)

Outline

  1. Intrinsic Mode Functions (IMFs) and Residue Using Bidirectional Empirical Mode Decomposition (BEMD)
  2. MDesNet Network Architecture
  3. Experimental Results

1. Intrinsic Mode Functions (IMFs) and Residue Using Bidirectional Empirical Mode Decomposition (BEMD)

Empirical Mode Decomposition (EMD) (From https://en.wikipedia.org/wiki/Hilbert%E2%80%93Huang_transform)
  • EMD is a signal processing technique invented by Huang in 1998. EMD is used for 1D signal whereas BEMD is used for 2D data.
  • First, BEMD is applied to decompose the input image into frequency components, called intrinsic mode functions (IMFs). (I am not going into details about EMD/BEMD, and would like to focus on the network iteself.)
  • where Y is the original image, I1 to I3 are IMFs and R is the residue.
  • Each IMF, which includes features of different scales and frequencies. Since low- and high-frequency components have been decomposed into different IMFs, we can obtain accurate features from IMFs and residue by applying relatively shallow subnetwork so as to avoid vanishing gradient problem.
  • I1 to I3 and R will then go through their own sub-network:
  • where Di is the subnetwork.
Illustration of image decomposition based on BEMD (a) original, (b) IMF1, © IMF2, (d) IMF3, and (e) residue.
  • The above figure shows the illustration of image decomposition based on BEMD: (a) original, (b) IMF1, (c) IMF2, (d) IMF3, and (e) residue.
  • We can see that the complexity of each IMF is decreasing and finally the residue is close to monotonous plane.
  • The IMFs represent the characteristics of image under different scales and reflect high- and low-frequency features, respectively.

2. MDesNet Network Architecture

MDesNet Network Architecture
Residual Learning Subnetwork D(F,L)
  • A subnetwork D(F, L) contains L convolution layers with K (also known as growth rate) feature maps, a 1×1 convolution layer with F feature maps and a deconvolution layer for upsampling.
  • Different from DenseNet, PReLU is used as activation function.
  • Inspired by ResNet, dense connected layers are combined with residual learning.
Reconstruction errors of different frequency components by convolutional network of different depths
  • We can see from the above figure that the reconstruction errors of high-frequency components are much larger than that of low-frequency components.
  • The errors of high-frequency components reduce more rapidly than that of low-frequency components when the network depth increases.
  • That means a larger network is more suitable for reconstruction of higher-frequency components.

Therefore, subnetworks of more feature maps and more layers are used to extract features from higher-frequency components.

Relatively shallower subnetworks are used to extract features from lower-frequency components so as to reduce redundant computational burden.

  • Hence, a subnetwork D(F, L) with F = 128 and L = 8 is used to exploit features from the IMF1, which contains the highest frequency components.
  • Two relatively shallow subnetworks (F = 64 and L = 4) are used to exploit features from the other two IMFs, which contain lower-frequency components.
  • The residue is processed through a subnetwork with F = 32 and L = 2.
  • At last, the output feature maps from all subnetworks are concatenated together, goes through the feature fusion net to get the final outcome.

3. Experimental Results

3.1. Datasets

  • Training dataset DIV2K is used with 800 HR images to train.
  • Set5, Set14, BSD, and Urban100, are used for evaluation.
  • Three networks with different scaling factor (s =2, 3, and 4) are trained.

3.2. Study of Residual Learning and Growth Rate

Convergence analysis on residual learning and nonresidual learning
  • The model with residual learning converges faster than nonresidual learning and performs better, which obtains higher PSNR on BSD100.
Convergence analysis on different growth rates K
  • The model performs better along with the increasing of K. However, the growing of K will result in the increasing of computational cost.
  • In this paper, K is set to 32 under the restrictions of limited computational resources.

3.3. SOTA Comparison

Average PSNR/SSIM for scale factor 2×; 3× and 4×
  • The method achieves the highest PSNR and SSIM values for almost all the cases compared to the other methods such as VDSR, DRCN, LapSRN, DRRN, WDRN, and MWCNN.
Visual results on image102061 from BSD100 with scaling factor 4×. (a) original HR image, (b) bicubic, © VDSR, (d) DRCN, (e) LapSRN, (f) DRRN, (g) WDRN, (h) MWCNN, and (i) MDesNet.
Visual results on image030 from Urban100 with scaling factor 4×. (a) original HR image, (b) bicubic, (c) VDSR, (d) DRCN, (e) LapSRN, (f) DRRN, (g) WDRN, (h) MWCNN, and (i) MDesNet.
Visual results on image074 from Urban100 with scaling factor 4×. (a) original HR image, (b) bicubic, © VDSR, (d) DRCN, (e) LapSRN, (f) DRRN, (g) WDRN, (h) MWCNN, and (i) MDesNet.
Average running time on BSD100 dataset
  • MDesNet spends more time than the other deep learning-based methods just because that it takes a lot of time to extract IMFs and residue from the input image based on BEMD.

In recent, I have read 3 papers: WDRN / WavResNet, MWCNN and this paper MDesNet. By transforming the input image from pixel domain to another domain which is more suitable to train, a shallower network can be applied to avoid gradient vanishing. If the transform is fast, the inference of the whole network should be faster. I believe it maybe suitable for time-constraint task, power-constraint device or fast approach.

During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. And this is the 34th story in this month. Thanks for visiting my story…

Few hours left for this month at my timezone. Can I reach 35 stories within this month…?

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet