Review — PM-Net: Accurate and Fast Blur Detection Using a Pyramid M-Shaped Deep Neural Network (Blur Detection)
Using a Pyramid M-Shaped Deep Neural Network, Outperforms Park CVPR’17 & Zeng TIP’19, etc.
In this story, Accurate and Fast Blur Detection Using a Pyramid M-Shaped Deep Neural Network, PM-Net, by University of Science and Technology of China, and Shijiazhuang Tiedao University, is reviewed.
There are challenging scenarios using handcrafted features:
- Handcrafted features can hardly differentiate a blurred region from a sharp but homogeneous region.
- Another challenge is that blur metrics based on handcrafted features are difficult to detect the pseudo-sharp backgrounds.
In this paper:
- A novel multi-input multi-loss encoder-decoder network (M-shaped) is proposed to learn rich hierarchical representations related to blur.
- Blur degree is susceptible to scales, a pyramid ensemble model (PM-Net) consisting of different scales of M-shaped subnets and a unified fusion layer, is constructed.
This is a paper in 2019 IEEE ACCESS where it is an open-accessed journal with high impact factor of 3.745. (Sik-Ho Tsang @ Medium)
Outline
- M-Shaped Network
- Pyramid M-Shaped Network (PM-Net)
- Ablation Study
- Experimental Results
1. M-Shaped Network
1.1. Overall Network Architecture
- The proposed M-shaped network, served as the backbone subnet of the PM-Net, is a multi-input multi-loss encoder-decoder network, based on the inspiration from the U-Net. A modified four-stage encoder-decoder architecture is used.
- In the encoder path, 3×3 convolutional layers and 2×2 max pooling with a stride of two are used.
- The convolutional layer consists of a convolution, batch norm and ReLU.
- In the decoder path, 3×3 convolutional layers and 2×2 up-sampling with a stride of two are used.
- In both paths, the higher stage has a coarser scale while the lower stage has a finer scale.
- A modified skip connection is used, which is mentioned later.
- The detailed configuration of each stage is as shown below:
1.2. Multi-Input Pyramid
- To exploit coarse and middle scale information, the max pooling is employed to sequentially down-sample the input image and construct a four-stage image pyramid with four decreasing scales {S, 1/2S, 1/4S, 1/8S}.
- Each-scale input is passed through a 3×3 convolutional layer to produce feature maps, then concatenated to the corresponding feature maps of the encoder path.
The multi-input pyramid transfers more blur information from the coarse scale to the ne scale into the feature extraction procedure, which is to integrate richer blur information.
1.3. Multi-Loss Pyramid
- The four stages of the decoder path render four probability maps Mk (k=1, 2, 3, 4) with different scales.
- The class-balanced cross-entropy loss function is used.
- The branch loss lk at the stage k is defined as:
- β is the weight balancing the losses from positive and negative samples. In this study, it is preferentially set to 0.5 where the best performance is achieved after a scan from 0.1 to 0.9.
- The sum of branch losses is:
- where αk is the weight balancing multi-level losses.
- α1=0.2, α2=0.3, α3=0.3, and α4=0.2.
- Higher weights are assigned to middle-level stages in order to better balance and merge hierarchical losses.
1.4. New Skip Connection
- A new skip module (marked by red dashed boxes) is established.
- Two additional 3×3 convolutional layers are employed before encoder feature maps are concatenated to the corresponding decoder feature maps.
The new skip module is proposed to transfer more low-level morphologic features into higher-level semantic space and improve the feature merging capability.
- The last decoder layer outputs a blur probability map in the same resolution of the input of the M-shaped network.
2. Pyramid M-Shaped Network (PM-Net)
- The PM-Net consists of a number of M-shaped subnets and a unified fusion layer.
- Each subnet corresponds to one input scale. Therefore, multi-scale subnet models also construct a multi-model pyramid.
- The blur probability maps obtained at the last decoder stage in every M-shape subnet are up-sampled to the same resolution as the raw input image.
- Then, a unified 1×1 convolutional layer is used to merge these output probability maps.
- The total loss of entire PM-Net is the sum of subnet losses, defined as:
- where wn is the weight balancing different subnet losses (lsn) and N is the number of subnets. wl is the weight of the loss of final fusion layer (lf).
- It provides a comprehensive supervision from low morphologic level to high semantic level.
3. Ablation Study
- For in-dataset experiments, we use the images and ground truths from BDD (Shi’s Dataset)with 80% of images for training and the remaining 20% of images for testing.
- A validation set (10% of the training set) is used and an early stop method during the training stage to avoid overfitting.
- First, the PM-Net with a complete input pyramid achieves the highest F1-score.
- Second, the PM-Net with a complete loss pyramid performs the best.
- PM^q-Net, where q is the number of subnets. PM¹-Net is with input image scale {S}. PM²-Net iswith input image scales {S, 1/2S}. PM³-Net iswith input image scales {S, 1/2S, 1/4S}. PM⁴-Net is with input image scales {S, 1/2S, 1/4S, 1/8S}.
- Third, PM³-Net exceeds PM²-Net and PM¹-Net in F1-score, and performs almost as same as PM4-Net. PM³-Net is treated as the best model for a 640×512 input image.
4. Experimental Results
4.1. Qualitative Results
- Obviously, the proposed PM-Nets perform the best defocus blur detection and have accurate judgement on sharp but homogeneous regions (e.g., solid-colored coat, skin, fur, feather, etc.) and pseudo-sharp backgrounds (e.g., background light speckles, etc.).
- It has better detection than Park CVPR’17 [22] & Zeng TIP’19 [25], etc.
- For motion blur detection, PM-Nets perform significantly better and closer to the ground truth than other SOTA approaches.
- The yellow curve marks the sharp but homogeneous region such as feather, skin, etc.
- We can clearly see that these anomalous regions including sharp but homogeneous regions (e.g. feather and skin marked by yellow curves) and pseudo-sharp backgrounds (e.g. light speckles marked by purple curves) are accurately distinguished by PM-Net.
- It can be clearly observed that the blurred regions including motion-blurred foreground (marked by yellow ellipse) and defocused background (marked by red dotted circle) are accurately detected.
4.2. Quantitative Results
- For only defocus blur detection, PM¹-Net achieves the F1-score of 0.876 and MAE of 0.102. PM³-Net achieves the highest F1-score of 0.893 and smallest MAE of 0.095.
- For both motion and defocus blur detection, PM¹-Net achieves the F1-score of 0.873 and MAE of 0.104. PM³-Net achieves the highest F1-score of 0.884 and smallest MAE of 0.096, outperforms Zeng TIP’19 [25], etc.
- Precision and recall (P-R) curves are generated for different methods by varying the threshold within the range [0, 255] to produce binary segmentations of final blur maps.
- Evidently, PM-Nets achieve the highest precision within the recall range from 0 to 1, outperforms Park CVPR’17 & Zeng TIP’19, etc.
5.3. Cross Dataset Results
- The 500 images in the challenging defocused dataset (CDD) are tested.
- The images with 256×256 pixels in CDD are directly tested utilizing the weights of a pre-trained PM-Net based on BDD.
- PM³-Net achieves the F1-score of 0.885 and MAE of 0.098.
- Both the achieved F1-score and MAE on this challenging dataset are significantly superior to the results of the method in [24].
- It proves the superior generalization capacity of the proposed method.
5.5. Running Time
- PM-Net has a very fast detection speed at the millisecond level.
- The runtime of PM-Nets is in the range from 27ms to 61ms for single 640×512 image. In contrast, all of other algorithms take much longer runtime from a few seconds to hundreds of seconds.
Reference
[2019 IEEE ACCESS] [PM-Net]
Accurate and Fast Blur Detection Using a Pyramid M-Shaped Deep Neural Network
Blur Detection / Defocus Map Estimation
2017 [Park CVPR’17 / DHCF / DHDE] 2018 [Purohit ICIP’18] [BDNet] [DBM] [BTBNet] 2019 [Khajuria ICIIP’19] [Zeng TIP’19] [PM-Net] 2020 [BTBCRL (BTBNet + CRLNet)]