Review: DnCNN — Residual Learning of Deep CNN for Image Denoising (Denoising & Super Resolution & JPEG Deblocking)

One Single Network Handles Three Tasks: Image Denoising, Single Image Super Resolution, and JPEG Deblocking

In this story, Denoising Convolutional Neural Network (DnCNN), by Harbin Institute of Technology, The Hong Kong Polytechnic University, Graz University of Technology, and Xi’an Jiaotong University, is reviewed. In this paper:

  • One single network is trained which can handle 3 tasks: Image Denoising, Single Image Super Resolution, and JPEG Deblocking.
  • Residual learning, originated in ResNet, and batch normalization, originated in Inception-v2, is used. With the residual learning strategy, DnCNN implicitly removes the latent clean image in the hidden layers.

Outline

  1. DnCNN Network Architecture
  2. Experimental Results on Denoising
  3. Experimental Results on 3 Tasks Together

1. DnCNN Network Architecture

DnCNN Network Architecture
  • For Gaussian denoising with a certain noise level, the receptive field size of DnCNN is set to 35×35 with the corresponding depth of 17. For other general image denoising tasks, a larger receptive field is adopted by setting the depth to be 20.
  • The residual learning formulation is adopted to train a residual mapping:
    x = y-R(y). Thus, R(y) is learnt.
  • To be specific, there are 3 types of layers.
  • (i) Conv+ReLU: For the first layer, 64 filters of size 3×3×c are used to generate 64 feature maps. c = 1 for gray image and c = 3 for color image.
  • (ii) Conv+BN+ReLU: for layers 2 to (D-1), 64 filters of size 3×3×64 are used, and batch normalization is added between convolution and ReLU.
  • (iii) Conv: for the last layer, c filters of size 3×3×64 are used to reconstruct the output.
  • Simple zero padding strategy is used before convolution which does not result in any boundary artifacts.
  • By incorporating convolution with ReLU, DnCNN can gradually separate image structure from the noisy observation through the hidden layers.
  • DnCNN is trained in an end-to-end fashion.
PNSR with/without Residual Learning (RL), Batch Normalization (BN)

2. Experimental Results

2.1. Dataset & Training

  • For Gaussian denoising with either known or unknown noise level, we follow [19] to use 400 images of size 180×180 for training.
  • Three noise levels, i.e., σ = 15, 25 and 50. The DnCNN model for Gaussian denoising with known specific noise level is referred as DnCNN-S.
  • For blind Gaussian denoising, σ ={0,55}, the single DnCNN model for blind Gaussian denoising task is referred as DnCNN-B.
  • Test dataset: one containing 68 natural images from Berkeley segmentation dataset (BSD68) and the other one containing 12 widely used testing images.
12 widely used testing images

2.2. Denoising Results

Average PSNRs on BSD68 Dataset
PSNRs on 12 widely used testing images
  • Specifically, DnCNN-S outperforms the competing methods by 0.2dB to 0.6dB on most of the images and fails to achieve the best results on only two images “House” and “Barbara”, which are dominated by repetitive structures.
Denoising results of one image from BSD68 with noise level 50.
Denoising results of the image “parrot” with noise level 50.
Color image denoising results of one image from the DSD68 dataset with noise level 35
Color image denoising results of one image from the DSD68 dataset with noise level 45
  • In addition, CDnCNN-B can generate images with more details and sharper edges than CBM3D.
Gaussian denoising results of two real images by DnCNN-B and CDnCNN-B models
  • One can see that the models can recover visually pleasant results while preserving image details.
Average PSNR improvement over BM3D/CBM3D with respect to different noise levels by our DnCNN-B/CDnCNN-B model.
  • This experimental result demonstrates the feasibility of training a single DnCNN-B model for handling blind Gaussian denoising within a wide range of noise levels.

2.3. Run Time

Run Time in Seconds for Different Sizes of Images
  • Though it is slower than BM3D and TNRD, by taking the image quality improvement into consideration, DnCNN is still very competitive in CPU implementation.
  • For the GPU time, the proposed DnCNN achieves very appealing computational efficiency, e.g., it can denoise an image of size 512×512 in 60ms with unknown noise level.

3. Experimental Results on 3 Tasks Together

  • Experiments on Learning a Single Model for Three General Image Denoising Tasks: including blind Gaussian denoising, SISR and JPEG image deblocking.
  • At that moment, none of the existing methods had been reported for handling these three tasks with only a single model.
  • For Gaussian denoising, it still outperforms the non-blind TNRD and BM3D.
  • For SISR, it surpasses TNRD by a large margin and is on par with VDSR. (Indeed, the network is similar to VDSR.)
  • For JPEG image deblocking, DnCNN-3 outperforms AR-CNN by about 0.3dB in PSNR and has about 0.1dB PSNR gain over TNRD on all the quality factors.
Single image super-resolution results of “butterfly” from Set5 dataset with upscaling factor 3.
Single image super-resolution results of one image from Urban100 dataset with upscaling factor 4.
  • DnCNN-3 can produce visually pleasant output result even the input image is corrupted by several distortions with different levels in different regions.

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG