Review — BDNet: Automatic Extraction of Blur Regions on a Single Image Based on Semantic Segmentation (Blur Detection)

ResNet + FCN-2s, Outperforms U-Net, SegNet & FCN-2s(VGG16)

6 min readJan 16, 2021

**First Row: Images, Second Row: Ground Truth Masks**

In this story, Automatic Extraction of Blur Regions on a Single Image Based on Semantic Segmentation, BDNet, by Southeast University, and International Joint Laboratory of Information Display and Visualization, is reviewed. In this paper:

Blur Detection Net, BDNet, is proposed to well integrate global image-level context and cross-layer context information by combining ResNet and Fully Convolutional Network (FCN).

This is a paper in 2020 IEEE ACCESS where ACCESS is a open access journal with high impact factor of 3.745. (Sik-Ho Tsang @ Medium)

Outline

Datasets
BDNet: Network Architecture
Experimental Results

1. Datasets

There are three datasets used in the experiments called BlurRaw, BlurDB1 and BlurDB2.
BlurRaw: That is the Shi dataset, which contains 1000 annotated images with 296 motion blur and 704 defocus.
Based on BlurRaw, two datasets are generated called BlurDB1 and BlurDB2.

BlurDB1: Image patches are randomly cropped with a size of 256×256 from raw images. 12 image patches are cropped from each motion blur image and 5 image patches from out of focus image.
Finally, 7072 images in total are obtained, with 6727 training images and 345 testing images.

BlurDB2: image patches are cropped from top left to bottom right with 1/2 area overlap for every two adjacent patches.
Finally, 9134 images in total are obtained. 10% of the dataset are used for testing and 90% for training.

2. BDNet: Network Architecture

In BDNet, ResNets are used as encoder to extract features and FCNs are used as decoder to get dense segmentation results.
The network is trained with transfer learning to detect blur regions successfully.

2.1. ResNet as Encoder

Simply increasing the depth of neural networks does not improve anymore, which is known as degradation problem.
A deep residual learning framework is used to handle this problem.
Generally, the residual unit with identity mapping is used:

F denotes the residual functions to be learned, e.g. a stack of convolution layers.
With many residual units used, the feature xL of any deeper unit L is:

Specifically, the model uses pre-trained on ImageNet data to initialize the encoder part (ResNet) of BDNet.
(I don’t spend too much here for ResNet. If interested, please feel free to visit ResNet.)

2.2. Fully Convolutional Network (FCN) as Decoder

FCNs use in-network upsampling layers (deconvolutional layers initialized by bilinear upsampling) to produce a dense prediction.
To obtain accurate and detailed segmentation, a skip architecture is introduced combining semantic information from deep layer to shallow layer, which improves the performance of FCNs dramatically.
FCN is extended to FCN2s (FCN-8s is used in FCN). Thus, more low-level features can be fused to final results.
However, the scale of features from different layers may be quite different. Therefore, the scale layer is replaced with batch normalization (BN) layer in BDNet.
Skip connection is added after each block with downsampling operation or concatenation layers after downsampling block.
Moreover, deconvolution layer is utilized to upsample, and Dropout layer is employed at the top of ResNet to avoid overfitting and this leads to a better result.
All deconvolution layer in FCNs initialized with bilinear kernels.
The cross-entropy loss is utilized.
The input of the network is a 224×224 color image with RGB three channels. The input images are preprocessed by subtracting the mean value in each channel.
The training data are augmented by randomly cropping, ipping horizontally or vertically. Finally, the model is trained with 100 epochs for 7 hours and 55 minutes.

3. Experimental Results

3.1. Training and Validation

**The trends of training loss curve and validation loss curve**

The model used here is BDNet (with BN, Dropout) trained on BlurDB2.
10% of training set is used for validation.
The trends of two curves are consistent, which can demonstrate that the model is not overfitting.

**The performance between different folds**

k-fold cross validation is also performed where k=3.
The performance between different folds is quite similar, especially after about 60 epochs. The experiments further demonstrate the robustness and generalization ability of the proposed method.

3.2. Quantitative Comparison

The data for evaluation comes from database BlurRaw.
As we can see from the table, BDNet outperforms all other methods in terms of all metrics. BDNet achieves a mean IoU of 0.800, pixel accuracy and mean accuracy over 0.9.
Also, the mean inference time of BDNet is less than 10 seconds, while no post-processing is needed.

**The results of BDNet,** **U-Net**, **SegNet, DSS,** **FCN2s(VGG16).**

DSS is used for saliency detection whereas SegNet and U-Net are sued for semantic segmentation.
In general, BDNet gets a good result in a simple yet effective way.

3.3. Qualitative Comparison

**Detection results of different methods**

Some pictures do not achieve expected results using BDNet such as images 7–10.
A large number of pixels are misclassified when a large texture flatten area presents. This problem can be solved by fusing more global context information or training on datasets with larger sized images.
Another problem is that the boundary is not determined accurately if the blur boundary is not sharp enough. This problem may be improved by some sample instance type aware learning techniques or some post-processing methods such as CRF (Conditional Random Field) and cluster.
More training data will definitely be helpful for getting better results.

3.4. Ablation Study

We can see that the results are getting better with more skip layer is added. Model VGG16-FCN2s achieves the best results in terms of all metrics.
It suggests that FCN2s can still improve the results for blur detection problem especially in the case where the segmentation boundaries are not sharp enough.

The above table shows that batch normalization (BN) layer, joint training and Dropout layer are helpful.
BDNet (with BN, Dropout) increases mean IoU by 1.4% on BlurDB2 and 1.8% on BlurRaw compared with BDNet(w/o BN).
(There are still a lot of ablation study results for BDNet. If interested, please feel free to read the paper.)

Reference

[2020 IEEE ACCESS] [BDNet (ACCESS’20)]
Automatic Extraction of Blur Regions on a Single Image Based on Semantic Segmentation

Blur Detection / Defocus Map Estimation

2017 [Park CVPR’17 / DHCF / DHDE] 2018 [Purohit ICIP’18] [BDNet (JENUCOM’18)] [DBM] [Kim JCGF’18] [BTBNet] 2019 [Khajuria ICIIP’19] [Zeng TIP’19] [PM-Net] [CENet] [DMENet] [DeFusionNet (CVPR’19)] 2020 [BDNet (ACCESS’20)] [BTBCRL (BTBNet + CRLNet)] [DeFusionNET (TPAMI’20)]