Review — PM-Net: Accurate and Fast Blur Detection Using a Pyramid M-Shaped Deep Neural Network (Blur Detection)

Using a Pyramid M-Shaped Deep Neural Network, Outperforms Park CVPR’17 & Zeng TIP’19, etc.

Sik-Ho Tsang
8 min readJan 3, 2021
Examples of (a) globally blurred image, (b) © partially motion-blurred images, and (d) partially defocused image.

In this story, Accurate and Fast Blur Detection Using a Pyramid M-Shaped Deep Neural Network, PM-Net, by University of Science and Technology of China, and Shijiazhuang Tiedao University, is reviewed.

There are challenging scenarios using handcrafted features:

  • Handcrafted features can hardly differentiate a blurred region from a sharp but homogeneous region.
  • Another challenge is that blur metrics based on handcrafted features are difficult to detect the pseudo-sharp backgrounds.

In this paper:

  • A novel multi-input multi-loss encoder-decoder network (M-shaped) is proposed to learn rich hierarchical representations related to blur.
  • Blur degree is susceptible to scales, a pyramid ensemble model (PM-Net) consisting of different scales of M-shaped subnets and a unified fusion layer, is constructed.

This is a paper in 2019 IEEE ACCESS where it is an open-accessed journal with high impact factor of 3.745. (Sik-Ho Tsang @ Medium)

Outline

  1. M-Shaped Network
  2. Pyramid M-Shaped Network (PM-Net)
  3. Ablation Study
  4. Experimental Results

1. M-Shaped Network

M-Shaped Network

1.1. Overall Network Architecture

  • The proposed M-shaped network, served as the backbone subnet of the PM-Net, is a multi-input multi-loss encoder-decoder network, based on the inspiration from the U-Net. A modified four-stage encoder-decoder architecture is used.
  • In the encoder path, 3×3 convolutional layers and 2×2 max pooling with a stride of two are used.
  • The convolutional layer consists of a convolution, batch norm and ReLU.
  • In the decoder path, 3×3 convolutional layers and 2×2 up-sampling with a stride of two are used.
  • In both paths, the higher stage has a coarser scale while the lower stage has a finer scale.
  • A modified skip connection is used, which is mentioned later.
  • The detailed configuration of each stage is as shown below:
Detailed configuration of each stage

1.2. Multi-Input Pyramid

  • To exploit coarse and middle scale information, the max pooling is employed to sequentially down-sample the input image and construct a four-stage image pyramid with four decreasing scales {S, 1/2S, 1/4S, 1/8S}.
  • Each-scale input is passed through a 3×3 convolutional layer to produce feature maps, then concatenated to the corresponding feature maps of the encoder path.

The multi-input pyramid transfers more blur information from the coarse scale to the ne scale into the feature extraction procedure, which is to integrate richer blur information.

1.3. Multi-Loss Pyramid

  • The four stages of the decoder path render four probability maps Mk (k=1, 2, 3, 4) with different scales.
  • The class-balanced cross-entropy loss function is used.
  • The branch loss lk at the stage k is defined as:
  • β is the weight balancing the losses from positive and negative samples. In this study, it is preferentially set to 0.5 where the best performance is achieved after a scan from 0.1 to 0.9.
  • The sum of branch losses is:
  • where αk is the weight balancing multi-level losses.
  • α1=0.2, α2=0.3, α3=0.3, and α4=0.2.
  • Higher weights are assigned to middle-level stages in order to better balance and merge hierarchical losses.

1.4. New Skip Connection

  • A new skip module (marked by red dashed boxes) is established.
  • Two additional 3×3 convolutional layers are employed before encoder feature maps are concatenated to the corresponding decoder feature maps.

The new skip module is proposed to transfer more low-level morphologic features into higher-level semantic space and improve the feature merging capability.

  • The last decoder layer outputs a blur probability map in the same resolution of the input of the M-shaped network.

2. Pyramid M-Shaped Network (PM-Net)

Upper Part: Pyramid M-Shaped Network (PM-Net), Lower Part: A M-Shaped Network
  • The PM-Net consists of a number of M-shaped subnets and a unified fusion layer.
  • Each subnet corresponds to one input scale. Therefore, multi-scale subnet models also construct a multi-model pyramid.
  • The blur probability maps obtained at the last decoder stage in every M-shape subnet are up-sampled to the same resolution as the raw input image.
  • Then, a unified 1×1 convolutional layer is used to merge these output probability maps.
  • The total loss of entire PM-Net is the sum of subnet losses, defined as:
  • where wn is the weight balancing different subnet losses (lsn) and N is the number of subnets. wl is the weight of the loss of final fusion layer (lf).
  • It provides a comprehensive supervision from low morphologic level to high semantic level.

3. Ablation Study

Effectiveness analysis of multi-input pyramid (first from the left), multi-loss pyramid (second from the left), and multi-model pyramid (right side) using F1-score.
  • For in-dataset experiments, we use the images and ground truths from BDD (Shi’s Dataset)with 80% of images for training and the remaining 20% of images for testing.
  • A validation set (10% of the training set) is used and an early stop method during the training stage to avoid overfitting.
  • First, the PM-Net with a complete input pyramid achieves the highest F1-score.
  • Second, the PM-Net with a complete loss pyramid performs the best.
  • PM^q-Net, where q is the number of subnets. PM¹-Net is with input image scale {S}. PM²-Net iswith input image scales {S, 1/2S}. PM³-Net iswith input image scales {S, 1/2S, 1/4S}. PM⁴-Net is with input image scales {S, 1/2S, 1/4S, 1/8S}.
  • Third, PM³-Net exceeds PM²-Net and PM¹-Net in F1-score, and performs almost as same as PM4-Net. PM³-Net is treated as the best model for a 640×512 input image.
Effectiveness analysis of individual M-shaped network
  • PM¹-Net is compared with original U-Net, as shown above.
  • The M-shaped network heightens the F1-score by 14.9% and reduces the MAE by 45.5% than the U-Net, respectively.
  • Evidently, the M-shaped architecture is very effective for blur detection.

4. Experimental Results

4.1. Qualitative Results

Defocus Blur Detection Results: (a) Inputs. (b) Results of Su et al. [13]. (c) Results of Shi et al. [14]. (d) Results of Javaran et al. [20]. (e) Results of Golestaneh et al. [36]. (f) Results of Shi et al. [35]. (g) Results of Park et al. [22]. (h) Results of Zeng et al. [25]. (i) Results of PM¹-Net. (j) Results of PM³-Net. (k) Ground truth.
  • Obviously, the proposed PM-Nets perform the best defocus blur detection and have accurate judgement on sharp but homogeneous regions (e.g., solid-colored coat, skin, fur, feather, etc.) and pseudo-sharp backgrounds (e.g., background light speckles, etc.).
  • It has better detection than Park CVPR’17 [22] & Zeng TIP’19 [25], etc.
Motion Blur Detection Results: (a) Inputs. (b) Results of Liu et al. [12]. (c) Results of Shi et al. [14]. (d) Results of Su et al. [13]. (e) Results of Javaran et al. [20]. (f) Results of Golestaneh et al. [36]. (g) Results of PM¹-Net. (h) Results of PM³-Net. (i) Ground truth.
  • For motion blur detection, PM-Nets perform significantly better and closer to the ground truth than other SOTA approaches.
(a) Inputs. (b) Results of our method. (c) Ground Truth.
  • The yellow curve marks the sharp but homogeneous region such as feather, skin, etc.
(a) Inputs. (b) Results of our method. (c) Ground Truth.
  • We can clearly see that these anomalous regions including sharp but homogeneous regions (e.g. feather and skin marked by yellow curves) and pseudo-sharp backgrounds (e.g. light speckles marked by purple curves) are accurately distinguished by PM-Net.
(a) Inputs. (b) Results of our method. (c) Ground Truth.
  • It can be clearly observed that the blurred regions including motion-blurred foreground (marked by yellow ellipse) and defocused background (marked by red dotted circle) are accurately detected.

4.2. Quantitative Results

Performance comparisons among different blur detection methods on BDD. (Shi’s Dataset)
  • For only defocus blur detection, PM¹-Net achieves the F1-score of 0.876 and MAE of 0.102. PM³-Net achieves the highest F1-score of 0.893 and smallest MAE of 0.095.
  • For both motion and defocus blur detection, PM¹-Net achieves the F1-score of 0.873 and MAE of 0.104. PM³-Net achieves the highest F1-score of 0.884 and smallest MAE of 0.096, outperforms Zeng TIP’19 [25], etc.
Comparison of precision-recall curves of the state-of-the-art methods on BDD. (Shi’s Dataset)
  • Precision and recall (P-R) curves are generated for different methods by varying the threshold within the range [0, 255] to produce binary segmentations of final blur maps.
  • Evidently, PM-Nets achieve the highest precision within the recall range from 0 to 1, outperforms Park CVPR’17 & Zeng TIP’19, etc.

5.3. Cross Dataset Results

Performance comparisons for cross-dataset evaluation on the CDD dataset.
  • The 500 images in the challenging defocused dataset (CDD) are tested.
  • The images with 256×256 pixels in CDD are directly tested utilizing the weights of a pre-trained PM-Net based on BDD.
  • PM³-Net achieves the F1-score of 0.885 and MAE of 0.098.
  • Both the achieved F1-score and MAE on this challenging dataset are significantly superior to the results of the method in [24].
  • It proves the superior generalization capacity of the proposed method.
(a) Input. (b) Golestaneh et al. [36]. (c) Shi et al. [35]. (d) Zhao et al. [24]. (e) PM1-Net. (f) PM3-Net. (g) Ground Truth.

5.5. Running Time

Runtime comparisons among different methods for a 640×512 defocused image.
  • PM-Net has a very fast detection speed at the millisecond level.
  • The runtime of PM-Nets is in the range from 27ms to 61ms for single 640×512 image. In contrast, all of other algorithms take much longer runtime from a few seconds to hundreds of seconds.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.