Review — Blind Image Blur Estimation via Deep Learning (Blur Classification)
In this story, Blind Image Blur Estimation via Deep Learning, by Nanjing University of Information Science and Technology, The University of Sheffield, and Northumbria University, is reviewed. In this paper:
- A learning-based method using a pre-trained deep neural network (DNN) and a general regression neural network (GRNN) is proposed to first classify the blur type and then estimate its parameters.
This is a paper in 2016 TIP with over 100 citations where TIP has high impact factor of 10.856. (Sik-Ho Tsang @ Medium)
- DNN & GRNN Framework
- Deep Neural Network (DNN)
- General Regression Neural Network (GRNN)
- Experimental Results
1. DNN & GRNN Framework
- DNN is the first stage for blur type classification, which has 3 output labels. B1, B2, and B3 are the features for Gaussian, motion, and defocus blur, respectively.
- GRNN is the blur PSF parameter estimation, which has different output labels for each blur type. P1, P2, and P3 are the estimated parameters.
1.2. Blur Types
1.2.1. Gaussian Blur
- In many applications, such as satellite imaging, Gaussian blur can be used to model the PSF of the atmospheric turbulence:
- where σ is the blur radius and to be estimated by GRNN, and R is the region of support. R is usually set as [−3σ, 3σ], because it contains 99.7% of the energy in a Gaussian function.
1.2.2. Motion Blur
- Another blur is caused by linear motion of the camera, which is called motion blur:
- where M describes the length of motion in pixels and ω is the motion direction with its angle to the x axis. ω is to be estimated by GRNN.
1.2.3. Defocus Blur
- The third blur is the defocus blur, which can be modeled as a cylinder function:
- where the blur radius r is proportional to the extent of defocusing, and is to be estimated by GRNN.
2. Deep Neural Network (DNN)
- Restricted Boltzmann Machine (RBM) is used to pretrain the Deep Neural Network (DNN).
(If interested, please read autoencoder for the use of RBM to pretrain DNN.)
- In brief, a pretraining of DNN is performed using RBM, then the network is fine-tuned using supervised learning approach.
- The input layer is trained in the first RBM as the visible layer. Then, a representation of the input blurred sample is obtained for further hidden layers.
- The next layer is trained as an RBM by greedy layer-wise information reconstruction. The training process of RBM is to update weights between two adjacent layers and the biases of each layer.
- Repeat the first and second steps until the parameters in all layers (visible and all hidden layers) are learned.
- In the supervised learning part, the above trained parameters are used for initializing the weights in the DNN.
- The goal for the optimization process is to minimize the backpropagation error derivatives, i.e. cross-entropy loss:
2.3. Some Details
- The output of this stage is 3 labels: the Gaussian blur, the motion blur and the defocus blur.
- The size of samples is 32 × 32.
- The input visible layer has 1024 nodes, and the output layer has 3 nodes.
- The whole architecture is: 1024 → 500 →30 → 10 → 3.
- With the label information from DNN, the classified blur vectors will be used in the second stage (GRNN) for blur parameter estimation.
3. General Regression Neural Network (GRNN)
- Once the classification part is completed, the blur type of the input patch could be specified.
- The general regression neural network is considered to be a generalization of both Radial Basis Function Networks (RBFN) and Probabilistic Neural Networks (PNN).
- It is composed of an input layer, a hidden layer, “unnormalized” output units, a summation unit, and normalized outputs.
- Assume that the training vectors can be represented as X and the training targets are Y.
- In the pattern layer, each hidden unit is corresponding to an input sample.
- From the pattern layer to the summation layer, each weight is the target for the input sample. The summation units can be denoted as:
- where σ is the spread parameter, and:
- (The paper does not cover too much details, only a small part.)
4. Experimental Results
- Training Datasets: The Oxford image classification dataset,2 and the Caltech 101 dataset are chosen to be the training sets. 5000 images are randomly selected from each of them. The size of the training samples is 32 × 32.
- Each training sample has two labels: one is its blur type (the values are 1, 2, or 3) and the other one is its blur parameter.
- At last, there are 36000 training samples, 12000 of them are degraded by Gaussian PSF, 12000 of them are degraded by the PSF of motion blur, and the rest are degraded by the defocus PSF.
- Testing Datasets: Berkeley segmentation dataset (200 images), Pascal VOC 2007: 500 images are randomly selected.
- 6000 testing samples are chosen from each of them according to the same procedure as the training set.
4.2. Classification Results
- The classification rate is used for evaluating the performance:
- where Nc is the number of correct classified samples, and Na is the total number of samples.
The proposed method performs best among all the algorithms using automatically learned features.
4.3. Regression Results
- Deblurring is applied based on the parameters estimated by GRNN.
- Then, quantitative metric is used to evaluate the deblurred image quality, i.e. some Image Quality Assessment (IQA) approaches.
- (The paper does not mention what deblurring algorithm is used for the above table.)
GRNN method achieves the best results among all.
- Contrary to the quantitative results, it is obvious that the images deblurred by EPLL  using the estimated parameters by the proposed GRNN have very competitive visual quality.
Indeed, I believe a conventional convolutional neural network (CNN) such as AlexNet can already work well for the blur classification task. and another CNN can be used to regress the parameters. Or a CNN can be designed to do both classification and regression tasks at the same time. For deblurring, a CNN can be designed as well, but deblurring is out of scope in this paper.