Review — PM-Net: Accurate and Fast Blur Detection Using a Pyramid M-Shaped Deep Neural Network (Blur Detection)

Using a Pyramid M-Shaped Deep Neural Network, Outperforms Park CVPR’17 & Zeng TIP’19, etc.

Examples of (a) globally blurred image, (b) © partially motion-blurred images, and (d) partially defocused image.
  • Another challenge is that blur metrics based on handcrafted features are difficult to detect the pseudo-sharp backgrounds.
  • Blur degree is susceptible to scales, a pyramid ensemble model (PM-Net) consisting of different scales of M-shaped subnets and a unified fusion layer, is constructed.

Outline

  1. M-Shaped Network
  2. Pyramid M-Shaped Network (PM-Net)
  3. Ablation Study
  4. Experimental Results

1. M-Shaped Network

M-Shaped Network

1.1. Overall Network Architecture

  • The proposed M-shaped network, served as the backbone subnet of the PM-Net, is a multi-input multi-loss encoder-decoder network, based on the inspiration from the U-Net. A modified four-stage encoder-decoder architecture is used.
  • In the encoder path, 3×3 convolutional layers and 2×2 max pooling with a stride of two are used.
  • The convolutional layer consists of a convolution, batch norm and ReLU.
  • In the decoder path, 3×3 convolutional layers and 2×2 up-sampling with a stride of two are used.
  • In both paths, the higher stage has a coarser scale while the lower stage has a finer scale.
  • A modified skip connection is used, which is mentioned later.
  • The detailed configuration of each stage is as shown below:
Detailed configuration of each stage

1.2. Multi-Input Pyramid

  • To exploit coarse and middle scale information, the max pooling is employed to sequentially down-sample the input image and construct a four-stage image pyramid with four decreasing scales {S, 1/2S, 1/4S, 1/8S}.
  • Each-scale input is passed through a 3×3 convolutional layer to produce feature maps, then concatenated to the corresponding feature maps of the encoder path.

1.3. Multi-Loss Pyramid

  • The four stages of the decoder path render four probability maps Mk (k=1, 2, 3, 4) with different scales.
  • The class-balanced cross-entropy loss function is used.
  • The branch loss lk at the stage k is defined as:
  • The sum of branch losses is:
  • α1=0.2, α2=0.3, α3=0.3, and α4=0.2.
  • Higher weights are assigned to middle-level stages in order to better balance and merge hierarchical losses.

1.4. New Skip Connection

  • A new skip module (marked by red dashed boxes) is established.
  • Two additional 3×3 convolutional layers are employed before encoder feature maps are concatenated to the corresponding decoder feature maps.

2. Pyramid M-Shaped Network (PM-Net)

Upper Part: Pyramid M-Shaped Network (PM-Net), Lower Part: A M-Shaped Network
  • Each subnet corresponds to one input scale. Therefore, multi-scale subnet models also construct a multi-model pyramid.
  • The blur probability maps obtained at the last decoder stage in every M-shape subnet are up-sampled to the same resolution as the raw input image.
  • Then, a unified 1×1 convolutional layer is used to merge these output probability maps.
  • The total loss of entire PM-Net is the sum of subnet losses, defined as:
  • It provides a comprehensive supervision from low morphologic level to high semantic level.

3. Ablation Study

Effectiveness analysis of multi-input pyramid (first from the left), multi-loss pyramid (second from the left), and multi-model pyramid (right side) using F1-score.
  • A validation set (10% of the training set) is used and an early stop method during the training stage to avoid overfitting.
  • First, the PM-Net with a complete input pyramid achieves the highest F1-score.
  • Second, the PM-Net with a complete loss pyramid performs the best.
  • PM^q-Net, where q is the number of subnets. PM¹-Net is with input image scale {S}. PM²-Net iswith input image scales {S, 1/2S}. PM³-Net iswith input image scales {S, 1/2S, 1/4S}. PM⁴-Net is with input image scales {S, 1/2S, 1/4S, 1/8S}.
  • Third, PM³-Net exceeds PM²-Net and PM¹-Net in F1-score, and performs almost as same as PM4-Net. PM³-Net is treated as the best model for a 640×512 input image.
Effectiveness analysis of individual M-shaped network
  • The M-shaped network heightens the F1-score by 14.9% and reduces the MAE by 45.5% than the U-Net, respectively.
  • Evidently, the M-shaped architecture is very effective for blur detection.

4. Experimental Results

4.1. Qualitative Results

Defocus Blur Detection Results: (a) Inputs. (b) Results of Su et al. [13]. (c) Results of Shi et al. [14]. (d) Results of Javaran et al. [20]. (e) Results of Golestaneh et al. [36]. (f) Results of Shi et al. [35]. (g) Results of Park et al. [22]. (h) Results of Zeng et al. [25]. (i) Results of PM¹-Net. (j) Results of PM³-Net. (k) Ground truth.
  • It has better detection than Park CVPR’17 [22] & Zeng TIP’19 [25], etc.
Motion Blur Detection Results: (a) Inputs. (b) Results of Liu et al. [12]. (c) Results of Shi et al. [14]. (d) Results of Su et al. [13]. (e) Results of Javaran et al. [20]. (f) Results of Golestaneh et al. [36]. (g) Results of PM¹-Net. (h) Results of PM³-Net. (i) Ground truth.
(a) Inputs. (b) Results of our method. (c) Ground Truth.
(a) Inputs. (b) Results of our method. (c) Ground Truth.
(a) Inputs. (b) Results of our method. (c) Ground Truth.

4.2. Quantitative Results

Performance comparisons among different blur detection methods on BDD. (Shi’s Dataset)
  • For both motion and defocus blur detection, PM¹-Net achieves the F1-score of 0.873 and MAE of 0.104. PM³-Net achieves the highest F1-score of 0.884 and smallest MAE of 0.096, outperforms Zeng TIP’19 [25], etc.
Comparison of precision-recall curves of the state-of-the-art methods on BDD. (Shi’s Dataset)
  • Evidently, PM-Nets achieve the highest precision within the recall range from 0 to 1, outperforms Park CVPR’17 & Zeng TIP’19, etc.

5.3. Cross Dataset Results

Performance comparisons for cross-dataset evaluation on the CDD dataset.
  • The images with 256×256 pixels in CDD are directly tested utilizing the weights of a pre-trained PM-Net based on BDD.
  • PM³-Net achieves the F1-score of 0.885 and MAE of 0.098.
  • Both the achieved F1-score and MAE on this challenging dataset are significantly superior to the results of the method in [24].
  • It proves the superior generalization capacity of the proposed method.
(a) Input. (b) Golestaneh et al. [36]. (c) Shi et al. [35]. (d) Zhao et al. [24]. (e) PM1-Net. (f) PM3-Net. (g) Ground Truth.

5.5. Running Time

Runtime comparisons among different methods for a 640×512 defocused image.
  • The runtime of PM-Nets is in the range from 27ms to 61ms for single 640×512 image. In contrast, all of other algorithms take much longer runtime from a few seconds to hundreds of seconds.

--

--

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store