Review: MWCNN — Multi-Level Wavelet-CNN for Image Restoration (Denoising & Super Resolution & JPEG Deblocking)
In this story, Multi-Level Wavelet-CNN for Image Restoration (MWCNN), is reviewed. The image is wavelet-transformed before inputting into the network. Wavelet-transform is also used for downsampling instead of convolution or max pooling. This is a paper in 2018 CVPRW with more than 70 citations. (Sik-Ho Tsang @ Medium)
- From Wavelet to MWCNN
- Network Architecture
- Experimental Results
1. From Wavelet to MWCNN
- In 2D discrete wavelet transform (DWT), four filters, i.e. fLL, fLH, fHL, and fHH, are used to convolve with an image x. The convolution results are then downsampled to obtain the four subband images x1, x2, x3, and x4.
- Due to the biorthogonal property of DWT, the original image x can be accurately reconstructed by the inverse wavelet transform (IWT), i.e., x = IWT(x1, x2, x3, x4).
- In multi-level wavelet packet transform (WPT), the subband images x1, x2, x3, and x4 are further processed with DWT to produce the decomposition results.
- For two-level WPT, each subband image xi (i = 1, 2, 3, or 4) is decomposed into four subband images xi,1, xi,2, xi,3, and
- Recursively, the results of three or higher levels WPT can be attained.
- The above Figure illustrates the decomposition and reconstruction of an image with WPT.
- WPT is a special case of FCN without the nonlinearity layers.
- WPT is extend to multi-level wavelet-CNN (MWCNN) by adding a CNN block between any two levels of DWTs.
- After each level of transform, all the subband images are taken as the inputs to a CNN block to learn a compact representation as the inputs to the subsequent level of transform.
- MWCNN is a generalization of multi-level WPT, and degrades to WPT when each CNN block becomes the identity mapping.
2. Network Architecture
- Each CNN block is a 4-layer FCN without pooling, and takes all the subband images as inputs. It is noted that the subband images after DWT are still dependent, and the ignorance of their dependence may be harmful to the restoration performance.
- Each layer of the CNN block is composed of convolution with 3×3 filters (Conv), batch normalization (BN), and rectified linear unit (ReLU) operations.
- As to the last layer of the last CNN block, Conv without BN and ReLU is adopted to predict residual image.
- Standard loss function is used:
2.2. Differences From U-Net
- Generally, MWCNN modifies U-Net from three aspects:
- For downsampling and upsampling, maxpooling and up-convolution are used in conventional U-Net, while DWT and IWT are utilized in MWCNN.
- For MWCNN, the downsampling results in the increase of feature map channels. Except the first one, the other CNN blocks are deployed to reduce the feature map channels for compact representation. For conventional U-Net, the downsampling has no effect on feature map channels, and the subsequent convolution layers are used to increase feature map channels.
- In MWCNN, element-wise summation is used to combine the feature maps from the contracting and expanding subnetworks. While in conventional U-Net, concatenation is adopted.
3. Experimental Results
3.1. Image Denoising
- MWCNN outperforms DnCNN, IRCNN, RED-Net and MemNet.
- MWCNN only slightly outperforms DnCNN by about 0.1 ~ 0.3dB in terms of PSNR on BSD68.
- As to other datasets, MWCNN generally achieves favorable performance when compared with the competing methods.
- When the noise level is high (e.g., σ = 50), the average PSNR by MWCNN can be 0.5dB higher than that by DnCNN on Set12, and 1.2dB higher on Urban100.
3.1. Single Image Super Resolution
- MWCNN outperforms VDSR, DnCNN, RED-Net, SRResNet, LapSRN, DRRN, MemNet and WavResNet.
- Compared with VDSR, MWCNN achieves a notable gain of about 0.4dB by PSNR on Set5 and Set14. On Urban100, MWCNN outperforms VDSR by about 0.9 ~ 1.4dB.
- Obviously, WaveResNet sightly outperform VDSR, and also is still inferior to MWCNN.
- It is noted that the network depth of SRResNet is 34, while that of MWCNN is 24. Moreover, SRResNet is trained with a much larger training set than MWCNN. Even so, when the scale factor is 4, MWCNN achieve slightly higher PSNR values on Set5 and BSD100, and is comparable to SRResNet on Set14.
3.3. JPEG Deblocking
3.4. Run Time
- The run time of MWCNN is far less than several state-of-the-art methods, including RED-Net, MemNet and DRRN. Note that these three methods also perform poorer than MWCNN in terms of PSNR/SSIM metrics.
3.5 MWCNN Levels
- MWCNN-3 with 24-layer architecture performs much better than MWCNN-1 and MWCNN-2, while MWCNN-4 only performs negligibly better than MWCNN-3 in terms of the PSNR metric.
- Moreover, the speed of MWCNN-3 is also moderate compared with other levels.
During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. And this is the 33rd story in this month. Thanks for visiting my story…
Last day of this month. How about 35 stories within this month…?
[2017 CVPRW] [MWCNN]
Multi-level Wavelet-CNN for Image Restoration
JPEG: [ARCNN] [RED-Net] [DnCNN] [Li ICME’17] [MemNet] [MWCNN]
HEVC:[Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN] [MLSDRN]
VVC: [Lu CVPRW’19] [Wang APSIPA ASC’19]