Review: DnCNN — Residual Learning of Deep CNN for Image Denoising (Denoising & Super Resolution & JPEG Deblocking)

One Single Network Handles Three Tasks: Image Denoising, Single Image Super Resolution, and JPEG Deblocking

6 min readApr 26, 2020

--

In this story, Denoising Convolutional Neural Network (DnCNN), by Harbin Institute of Technology, The Hong Kong Polytechnic University, Graz University of Technology, and Xi’an Jiaotong University, is reviewed. In this paper:

The network is able to handle Gaussian denoising with unknown noise level (i.e. blind Gaussian denoising).
One single network is trained which can handle 3 tasks: Image Denoising, Single Image Super Resolution, and JPEG Deblocking.
Residual learning, originated in ResNet, and batch normalization, originated in Inception-v2, is used. With the residual learning strategy, DnCNN implicitly removes the latent clean image in the hidden layers.

This is a paper in 2017 TIP with over 1700 citations, where TIP has a high impact factor of 6.79. (Sik-Ho Tsang @ Medium)

Outline

DnCNN Network Architecture
Experimental Results on Denoising
Experimental Results on 3 Tasks Together

1. DnCNN Network Architecture

The size of convolutional filters are set to be 3×3 and all pooling layers are removed. Therefore, the receptive field of DnCNN with depth of d should be (2d+1)(2d+1).
For Gaussian denoising with a certain noise level, the receptive field size of DnCNN is set to 35×35 with the corresponding depth of 17. For other general image denoising tasks, a larger receptive field is adopted by setting the depth to be 20.
The residual learning formulation is adopted to train a residual mapping:
x = y-R(y). Thus, R(y) is learnt.
To be specific, there are 3 types of layers.
(i) Conv+ReLU: For the first layer, 64 filters of size 3×3×c are used to generate 64 feature maps. c = 1 for gray image and c = 3 for color image.
(ii) Conv+BN+ReLU: for layers 2 to (D-1), 64 filters of size 3×3×64 are used, and batch normalization is added between convolution and ReLU.
(iii) Conv: for the last layer, c filters of size 3×3×64 are used to reconstruct the output.
Simple zero padding strategy is used before convolution which does not result in any boundary artifacts.
By incorporating convolution with ReLU, DnCNN can gradually separate image structure from the noisy observation through the hidden layers.
DnCNN is trained in an end-to-end fashion.

**PNSR with/without Residual Learning (RL), Batch Normalization (BN)**

With both Residual Learning (RL), Batch Normalization (BN), PSNR obtained is highest.

2. Experimental Results

2.1. Dataset & Training

For Gaussian denoising with either known or unknown noise level, we follow [19] to use 400 images of size 180×180 for training.
Three noise levels, i.e., σ = 15, 25 and 50. The DnCNN model for Gaussian denoising with known specific noise level is referred as DnCNN-S.
For blind Gaussian denoising, σ ={0,55}, the single DnCNN model for blind Gaussian denoising task is referred as DnCNN-B.
Test dataset: one containing 68 natural images from Berkeley segmentation dataset (BSD68) and the other one containing 12 widely used testing images.

2.2. Denoising Results

As one can see, both DnCNN-S and DnCNN-B can achieve the best PSNR results than the competing methods.

**PSNRs on 12 widely used testing images**

The proposed DnCNN-S yields the highest PSNR on most of the images.
Specifically, DnCNN-S outperforms the competing methods by 0.2dB to 0.6dB on most of the images and fails to achieve the best results on only two images “House” and “Barbara”, which are dominated by repetitive structures.

**Denoising results of one image from BSD68 with noise level 50.**

**Denoising results of the image “parrot” with noise level 50.**

For grayscale image denoising, as shown the examples above, DnCNN-S and DnCNN-B can not only recover sharp edges and fine details but also yield visually pleasant results in the smooth region.

**Color image denoising results of one image from the DSD68 dataset with noise level 35**

**Color image denoising results of one image from the DSD68 dataset with noise level 45**

For color image denoising, CBM3D generates false color artifacts in some regions whereas CDnCNN-B can recover images with more natural color. (CDnCNN means color version of DnCNN)
In addition, CDnCNN-B can generate images with more details and sharper edges than CBM3D.

**Gaussian denoising results of two real images by DnCNN-B and CDnCNN-B models**

DnCNN-B can also work well on real noisy images when the noise is additive white Gaussian-like.
One can see that the models can recover visually pleasant results while preserving image details.

**Average PSNR improvement over BM3D/CBM3D with respect to different noise levels by our DnCNN-B/CDnCNN-B model.**

DnCNN-B/CDnCNN-B models consistently outperform BM3D/CBM3D by a large margin on a wide range of noise levels.
This experimental result demonstrates the feasibility of training a single DnCNN-B model for handling blind Gaussian denoising within a wide range of noise levels.

2.3. Run Time

**Run Time in Seconds for Different Sizes of Images**

DnCNN can have a relatively high speed on CPU and it is faster than two discriminative models, MLP and CSF.
Though it is slower than BM3D and TNRD, by taking the image quality improvement into consideration, DnCNN is still very competitive in CPU implementation.
For the GPU time, the proposed DnCNN achieves very appealing computational efficiency, e.g., it can denoise an image of size 512×512 in 60ms with unknown noise level.

3. Experimental Results on 3 Tasks Together

Experiments on Learning a Single Model for Three General Image Denoising Tasks: including blind Gaussian denoising, SISR and JPEG image deblocking.
At that moment, none of the existing methods had been reported for handling these three tasks with only a single model.

A single DnCNN-3 model is trained for the three different tasks.
For Gaussian denoising, it still outperforms the non-blind TNRD and BM3D.
For SISR, it surpasses TNRD by a large margin and is on par with VDSR. (Indeed, the network is similar to VDSR.)
For JPEG image deblocking, DnCNN-3 outperforms AR-CNN by about 0.3dB in PSNR and has about 0.1dB PSNR gain over TNRD on all the quality factors.

**Single image super-resolution results of “butterfly” from Set5 dataset with upscaling factor 3.**

**Single image super-resolution results of one image from Urban100 dataset with upscaling factor 4.**

For SISR. It can be seen that both DnCNN-3 and VDSR can produce sharp edges and fine details whereas TNRD tend to generate blurred edges and distorted lines.

For JPEG deblocking, DnCNN-3 can recover the straight line whereas AR-CNN and TNRD are prone to generate distorted lines.

The input image above is composed by noisy images with noise level 15 (upper left) and 25 (lower left), bicubically interpolated low-resolution images with upscaling factor 2 (upper middle) and 3 (lower middle), JPEG images with quality factor 10 (upper right) and 30 (lower right).
DnCNN-3 can produce visually pleasant output result even the input image is corrupted by several distortions with different levels in different regions.

During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 26th story in this month. Thanks for visiting my story…

Reference

[2016 TIP] [DnCNN]
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Super Resolution

[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DnCNN] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet] [SRGAN & SRResNet] [EDSR & MDSR] [SR+STN]

Codec Filtering

JPEG: [ARCNN] [RED-Net] [DnCNN] [Li ICME’17]
HEVC:[Lin DCC’16] [IFCNN] [VRCNN] [DCAD] [MMS-net] [DRN] [Lee ICCE’18] [DS-CNN] [RHCNN] [VRCNN-ext] [S-CNN & C-CNN]
VVC: [Lu CVPRW’19] [Wang APSIPA ASC’19]