Review — BDNet: Blur Detection Convolutional Neural Network (Blur Detection)

Fusing Blur Detection Results From Multiscale AlexNet-Like Networks, Obtain Better Results

8 min readDec 26, 2020

--

In this story, Multiscale blur detection by learning discriminative deep features, BDNet, by Tianjin University, and Civil Aviation University of China, is reviewed. In this paper:

A simple yet effective 6-layer CNN model, with 5 layers for feature extraction and 1 for binary classification is proposed, which can faithfully produce patch-level blur likelihood.
The network is applied at three coarse-to-fine scales. The multiscale blur likelihood maps optimally fused to generate better blur detection.

This is a paper in 2018 JNEUCOM with over 20 citations where JNEUCOM is a journal in Elsevier with a high impact factor of 4.438. (Sik-Ho Tsang @ Medium)

Outline

BDNet: Single Scale Deep Blur Detection
BDNet: Multiscale Deep Blur Detection
CBDNet: Compressed BDNet
Experimental Results

1. BDNet: Single Scale Deep Blur Detection

1.1. BDNet: Network Architecture

**BDNet: Network Architecture (**C: convolutional layer; P: max pooling layer; R: ReLU; F: fully connected layer; D: Dropout layer; S: softmax layer)

A six layers CNN model for single scale blur detection similar to AlexNet is designed.
The first convolutional layer has 96 filters of size 5×5 to extract low level features.
The second convolutional layer has 256 filters of size 5×5 to extract middle level features.
The third convolutional layer has 384 filters of size 3×3 that responds for high-level features extraction.
Each convolutional layer is followed by a 2 ×2 max pooling layer.
The fourth and fifth layers are fully connected layers, which have 2048 neurons for each layer.
Dropout with probability of 0.5 is used to avoid overfitting in layers 4 and 5.
The last layer is a 2-way softmax layer for binary classification.
The details are as follows:

1.2. Dataset Preparation

The Shi’s Dataset is used which has 296 motion blur images and 704 out-of-focus blur images. 80% of each type images are randomly selected as training set. 20% are used as test set.
For each image in the training image set, training patches are collected by sampling the image patches in multiple patch scale (i.e. 21×21, 35×35 and 49×49) with a stride of 5 pixels by means of sliding window.
The patch is labeled as positive (blurred), if the number of the blurred pixel is more than 80%. Otherwise, it is negative.
To increase the diversity of the training patches, the training patches on the resized training images at resize ratios of 0.5 and 0.25, are also sampled.
The ratio of positive to negative training patches is restricted to 1. Finally, about 10, 5 and 4 millions training patches for patch scales of 21×21, 35×35 and 49×49, respectively, are collected.
80% samples in each scale are randomly selected to train the model. The left 20% samples are used for validation. The ratio of the positive and negative is also fixed at 1.
Since the input size of our model is different in each scale, the networks are trained separately using stochastic gradient descent with a batch size of 128.

2. BDNet: Multiscale Deep Blur Detection

For a given image, we obtain the blur detection map Ds in the s-th scale.
An optimal model is built to estimate the blur probability Bs in each scale s. By vectorizing Bs and Ds to bs and ds, respectively, the energy function is:

where p is pixel index. There are three terms.
The first term is data term which assigns probability to pixel p.
The second term keeps the consistence of blur degree of the neighborhood in same scale.
The third term makes the consistency of blur degree for different scales.
And wspq is the appearance similarity of two pixels and defined as:

Where fp is appearance of a pixel at position p.
Parameters α and β are set to 0.5.
The final blur map can be obtained by reshaping the optimal ˆb1 at the finest scale.

3. CBDNet: Compressed BDNet

**The feature maps in each layer can be classified into four classes, i.e., positive, negative, image-like and null.**

The feature maps in each layer can be classified into four classes, i.e., positive, negative, image-like and null.
The classification is by observation/manual operation.
The positive feature maps look like the final blur detection result that blur regions have large values and sharp regions have small values.
The negative feature maps are opposite to the positive ones.
The image-like feature maps have large values in both blur and sharp regions.
The null feature maps are different from the first three types that have very small values even all zeros in whole map extent.

**MAE and F 1 -measure of the blur detection results by removing different features in layers 1–5. “0” denotes keeping the features and “1” denotes removing the features.**

To be brief, the image-like features are more effective in layers 1–3 and less useful in layers 4 and 5.
The null features have less effects in almost all layers.
Thus the null responded filters are removed and the remaining filters in their original BDNet-s are sampled according to their ratios.
The filter numbers from the first to the fifth layer are set to 64, 64, 64, 512 and 256, respectively, which formed the compressed network called CBDNet-s. This new network is then fine-tuned until converge.
(If interested in this part, please feel free to read the paper. There is a large coverage about the feature analysis based on the feature map types.)

4. Experimental Results

4.1. BDNet

**The comparison of the blur detection results**

From the the figures, we clearly have three observations:

The scale ambiguity does exist in blur detection.
With the fine scale, the proposed model detects the blur within a small extent. Thus, the detection misclassifies a smooth region to be blurred when it appears in non-blurred context. On the contrary, a region with texture but within a blurred context may be misclassified into non-blurred.
When using large scale, the blur detection results are dilated. However, due to large region contains more context information, the blur detector is more robust in conquering the above problem.

The proposed fused blur detection results are better than the results of our single scale BDNet-s.

**The comparison of the MAE, Precision, Recall, F 0.3 -measure and F 1 -measure of different blur detection results**

From the table, even the proposed single scale blur detection results are better than those of the state-of-the-arts.

4.2. CBDNet

**The comparison of random and deliberate sampled filters for CBDNet**

The compressed networks with deliberate sampled filters are obviously superior to the compressed networks with randomly sampled.
In addition the sizes of BDNet-1,2,3 are 26M, 35M and 73.2M, respectively. While the size of CBDNet-1,2,3 are 1.2M, 1.6M and 3.2M, respectively.
The total size of CBDNet-1,2,3 is 6M, which is 4% of the total size of BDNet1,2,3. It makes our blur detection system easier to deploy on mobile phone or FPGA device.

4.3. Further Studies

**Influence of blur kernel size and noise level for different blur detectors**

10 images are randomly selected and separately applied 7 blur kernel sizes {3, 7, 11, 15, 21, 25, 31} and 7 Gaussian noise levels ranging from [0, 0.003] for testing.
As shown in the above figure, two main observations are obtained.

All compared blur detectors are quite stable on increasing blur kernel sizes and only exhibit ac- curacy dropping on smallest blur kernel 3.
Low-level blur features are quite robust to noise, while the performance of the proposed method and Shi et al. [10] drops quickly as noise level increases. One possible way to boost noise robustness of the proposed method is to properly include noises in the training stage.

4.4. Running Time

The time comparison of the different blur detectors for running a 640×480 image.

BDNet-1,2,3 run on TitanX GPU, but the fusion of BDNet-F runs on a laptop with i7 CPU and 16 GB RAM.
The fusion of three scales of blur detection maps for a 640 ×480 image takes 31.28s.

4.5. Limitation Discussion

As mentioned above, the fusion of three scales of blur detection maps for a 640 ×480 image takes 31.28 s, which occupies 69.25% of total running time.
The slow fusion limits our blur detection on fast image or video processing applications like real-time video blur detection.

As shown above, three blur detection network separately run to generate multiscale blur detections which is not efficient. Based on the above limitations, A larger network is required.

4.6. Blur-Aware Saliency Detection

**The comparison of saliency detection results**

Even the state-of-the-art saliency object detectors [34–37] do not well consider the blur cues.
A dataset BAS500 is built including 500 images of out-of-focus blur and motion blur images captured by photographers that randomly selected from Flickr and Fengniao.
The salient objects are manually labeled in each image to facilitate quantitative comparison.
For each detector, the proposed blur detection map is embedded as an extra background prior. The improved saliency detectors are denoted by suffix “_BA”.
As shown above, the blur-aware saliency detection can improve the detection accuracy while the backgrounds are complex and blurred.

**The quantitative comparison of the saliency detection result**

**The comparison of MAE and F 0.3 -measure of saliency object detection results**

The quantitative comparison on BAS500 is shown above, which verifies that the saliency detection can benefit from reliable blur detection.
As shown in the Table, a comparable performance is obtained on MSRA1000.