Review: MWCNN — Multi-Level Wavelet-CNN for Image Restoration (Denoising & Super Resolution & JPEG Deblocking)

Wavelet Input to U-Net, Outperforms ARCNN, IRCNN, VDSR, DnCNN, RED-Net, SRResNet, LapSRN, DRRN, MemNet and WavResNet.

Sik-Ho Tsang
6 min readApr 30, 2020


In this story, Multi-Level Wavelet-CNN for Image Restoration (MWCNN), is reviewed. The image is wavelet-transformed before inputting into the network. Wavelet-transform is also used for downsampling instead of convolution or max pooling. This is a paper in 2018 CVPRW with more than 70 citations. (Sik-Ho Tsang @ Medium)


  1. From Wavelet to MWCNN
  2. Network Architecture
  3. Experimental Results

1. From Wavelet to MWCNN

1.1. Wavelet

Multi-level WPT architecture
  • In 2D discrete wavelet transform (DWT), four filters, i.e. fLL, fLH, fHL, and fHH, are used to convolve with an image x. The convolution results are then downsampled to obtain the four subband images x1, x2, x3, and x4.
  • Due to the biorthogonal property of DWT, the original image x can be accurately reconstructed by the inverse wavelet transform (IWT), i.e., x = IWT(x1, x2, x3, x4).
  • In multi-level wavelet packet transform (WPT), the subband images x1, x2, x3, and x4 are further processed with DWT to produce the decomposition results.
  • For two-level WPT, each subband image xi (i = 1, 2, 3, or 4) is decomposed into four subband images xi,1, xi,2, xi,3, and
  • xi,4.
  • Recursively, the results of three or higher levels WPT can be attained.
  • The above Figure illustrates the decomposition and reconstruction of an image with WPT.
  • WPT is a special case of FCN without the nonlinearity layers.

1.2. MWCNN

Multi-level wavelet-CNN architecture
  • WPT is extend to multi-level wavelet-CNN (MWCNN) by adding a CNN block between any two levels of DWTs.
  • After each level of transform, all the subband images are taken as the inputs to a CNN block to learn a compact representation as the inputs to the subsequent level of transform.
  • MWCNN is a generalization of multi-level WPT, and degrades to WPT when each CNN block becomes the identity mapping.

2. Network Architecture

Multi-level wavelet-CNN architecture

2.1. Network

  • Each CNN block is a 4-layer FCN without pooling, and takes all the subband images as inputs. It is noted that the subband images after DWT are still dependent, and the ignorance of their dependence may be harmful to the restoration performance.
  • Each layer of the CNN block is composed of convolution with 3×3 filters (Conv), batch normalization (BN), and rectified linear unit (ReLU) operations.
  • As to the last layer of the last CNN block, Conv without BN and ReLU is adopted to predict residual image.
  • Standard loss function is used:

2.2. Differences From U-Net

  • Generally, MWCNN modifies U-Net from three aspects:
  1. For downsampling and upsampling, maxpooling and up-convolution are used in conventional U-Net, while DWT and IWT are utilized in MWCNN.
  2. For MWCNN, the downsampling results in the increase of feature map channels. Except the first one, the other CNN blocks are deployed to reduce the feature map channels for compact representation. For conventional U-Net, the downsampling has no effect on feature map channels, and the subsequent convolution layers are used to increase feature map channels.
  3. In MWCNN, element-wise summation is used to combine the feature maps from the contracting and expanding subnetworks. While in conventional U-Net, concatenation is adopted.

3. Experimental Results

3.1. Image Denoising

Average PSNR(dB)/SSIM results on datasets Set14, BSD68 and Urban100
  • MWCNN outperforms DnCNN, IRCNN, RED-Net and MemNet.
  • MWCNN only slightly outperforms DnCNN by about 0.1 ~ 0.3dB in terms of PSNR on BSD68.
  • As to other datasets, MWCNN generally achieves favorable performance when compared with the competing methods.
  • When the noise level is high (e.g., σ = 50), the average PSNR by MWCNN can be 0.5dB higher than that by DnCNN on Set12, and 1.2dB higher on Urban100.
Visual Quality

3.1. Single Image Super Resolution

Average PSNR(dB) / SSIM results on datasets Set5, Set14, BSD100 and Urban100
  • MWCNN outperforms VDSR, DnCNN, RED-Net, SRResNet, LapSRN, DRRN, MemNet and WavResNet.
  • Compared with VDSR, MWCNN achieves a notable gain of about 0.4dB by PSNR on Set5 and Set14. On Urban100, MWCNN outperforms VDSR by about 0.9 ~ 1.4dB.
  • Obviously, WaveResNet sightly outperform VDSR, and also is still inferior to MWCNN.
  • It is noted that the network depth of SRResNet is 34, while that of MWCNN is 24. Moreover, SRResNet is trained with a much larger training set than MWCNN. Even so, when the scale factor is 4, MWCNN achieve slightly higher PSNR values on Set5 and BSD100, and is comparable to SRResNet on Set14.
Visual Quality

3.3. JPEG Deblocking

Average PSNR(dB) / SSIM results on datasets Classic5 and LIVE1
  • On Classic5 and LIVE1, the PSNR values of MWCNN can be 0.2 ~ 0.3dB higher than those of the second best method, outperforms ARCNN, DnCNN and MemNet.

3.4. Run Time

Run Time in Seconds
  • The run time of MWCNN is far less than several state-of-the-art methods, including RED-Net, MemNet and DRRN. Note that these three methods also perform poorer than MWCNN in terms of PSNR/SSIM metrics.

3.5 MWCNN Levels

Average PSNR (dB) and run time (in seconds) of MWCNNs with different levels on Gaussian denoising with the noise level of 50.
  • MWCNN-3 with 24-layer architecture performs much better than MWCNN-1 and MWCNN-2, while MWCNN-4 only performs negligibly better than MWCNN-3 in terms of the PSNR metric.
  • Moreover, the speed of MWCNN-3 is also moderate compared with other levels.

During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. And this is the 33rd story in this month. Thanks for visiting my story…

Last day of this month. How about 35 stories within this month…?



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.