Review: MDesNet — Multichannel Densely Convolutional Network (Super Resolution)
Image Transformed Before Inputting into DenseNet-Like Network, Outperforms VDSR, DRCN, LapSRN, DRRN, WDRN, MWCNN
In this story, Multichannel Densely Convolutional Network (MDesNet), by Wuhan University, and Wuhan Electric Power Design Institute, is reviewed.
- The proposed MDesNet first decomposes the input image into intrinsic mode functions (IMFs) and residue based on bidimensional empirical mode decomposition.
- Thus, relatively shallow subnetwork can be applied so as to avoid vanishing gradient problem.
This is a paper in 2018 JEI. (Sik-Ho Tsang @ Medium)
Outline
- Intrinsic Mode Functions (IMFs) and Residue Using Bidirectional Empirical Mode Decomposition (BEMD)
- MDesNet Network Architecture
- Experimental Results
1. Intrinsic Mode Functions (IMFs) and Residue Using Bidirectional Empirical Mode Decomposition (BEMD)
- EMD is a signal processing technique invented by Huang in 1998. EMD is used for 1D signal whereas BEMD is used for 2D data.
- First, BEMD is applied to decompose the input image into frequency components, called intrinsic mode functions (IMFs). (I am not going into details about EMD/BEMD, and would like to focus on the network iteself.)
- where Y is the original image, I1 to I3 are IMFs and R is the residue.
- Each IMF, which includes features of different scales and frequencies. Since low- and high-frequency components have been decomposed into different IMFs, we can obtain accurate features from IMFs and residue by applying relatively shallow subnetwork so as to avoid vanishing gradient problem.
- I1 to I3 and R will then go through their own sub-network:
- where Di is the subnetwork.
- The above figure shows the illustration of image decomposition based on BEMD: (a) original, (b) IMF1, (c) IMF2, (d) IMF3, and (e) residue.
- We can see that the complexity of each IMF is decreasing and finally the residue is close to monotonous plane.
- The IMFs represent the characteristics of image under different scales and reflect high- and low-frequency features, respectively.
2. MDesNet Network Architecture
- A subnetwork D(F, L) contains L convolution layers with K (also known as growth rate) feature maps, a 1×1 convolution layer with F feature maps and a deconvolution layer for upsampling.
- Different from DenseNet, PReLU is used as activation function.
- Inspired by ResNet, dense connected layers are combined with residual learning.
- We can see from the above figure that the reconstruction errors of high-frequency components are much larger than that of low-frequency components.
- The errors of high-frequency components reduce more rapidly than that of low-frequency components when the network depth increases.
- That means a larger network is more suitable for reconstruction of higher-frequency components.
Therefore, subnetworks of more feature maps and more layers are used to extract features from higher-frequency components.
Relatively shallower subnetworks are used to extract features from lower-frequency components so as to reduce redundant computational burden.
- Hence, a subnetwork D(F, L) with F = 128 and L = 8 is used to exploit features from the IMF1, which contains the highest frequency components.
- Two relatively shallow subnetworks (F = 64 and L = 4) are used to exploit features from the other two IMFs, which contain lower-frequency components.
- The residue is processed through a subnetwork with F = 32 and L = 2.
- At last, the output feature maps from all subnetworks are concatenated together, goes through the feature fusion net to get the final outcome.
3. Experimental Results
3.1. Datasets
- Training dataset DIV2K is used with 800 HR images to train.
- Set5, Set14, BSD, and Urban100, are used for evaluation.
- Three networks with different scaling factor (s =2, 3, and 4) are trained.
3.2. Study of Residual Learning and Growth Rate
- The model with residual learning converges faster than nonresidual learning and performs better, which obtains higher PSNR on BSD100.
- The model performs better along with the increasing of K. However, the growing of K will result in the increasing of computational cost.
- In this paper, K is set to 32 under the restrictions of limited computational resources.
3.3. SOTA Comparison
- MDesNet spends more time than the other deep learning-based methods just because that it takes a lot of time to extract IMFs and residue from the input image based on BEMD.
In recent, I have read 3 papers: WDRN / WavResNet, MWCNN and this paper MDesNet. By transforming the input image from pixel domain to another domain which is more suitable to train, a shallower network can be applied to avoid gradient vanishing. If the transform is fast, the inference of the whole network should be faster. I believe it maybe suitable for time-constraint task, power-constraint device or fast approach.
During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. And this is the 34th story in this month. Thanks for visiting my story…
Few hours left for this month at my timezone. Can I reach 35 stories within this month…?
Reference
[2018 JEI] [MDesNet]
Single image super resolution by multichannel densely connected convolutional network
Super Resolution
[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DnCNN] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [MemNet] [IRCNN] [WDRN / WavResNet] [MWCNN] [SRDenseNet] [SRGAN & SRResNet] [EDSR & MDSR] [MDesNet] [SR+STN]