[Review] Purohit ICIP’18: Learning Based Single Image Blur Detection (Blur Detection)
In this story, Learning Based Single Image Blur Detection and Segmentation, Purohit ICIP’18, by Indian Institute of Technology Madras, is reviewed. In this paper:
- Global context and local features are jointly learned in the network. Two sub-networks are trained to perform the task at global (image) and local (patch) levels.
- The pixel-level probabilities are aggregated and estimated by two networks and then fed to a MRF based framework which returns a refined and dense segmentation-map.
This is a paper in 2018 ICIP. (Sik-Ho Tsang @ Medium)
- Proposed Network Architecture
- Experimental Results
1. Proposed Network Architecture
1.1. Patch Level Classification Network (Bottom)
- 30×30 small image patches are as input.
- These small image patches are obtained from synthetically blurred images.
- Each ConvBlock has 2 convolutional layers containing filters of size 3×3.
- For the first layer, it has 32 filters, followed by ReLU, then followed by another conv with stride of 2, followed by Batch Norm (BN-Inception / Inception-v2) and ReLU.
- The two subsequent ConvBlocks carry the same structure but have 64 and 128 filters respectively.
- After that, fully connected layer is used to bring down the dimension to 1, then pass through Sigmoid layer to obtain the the probability of patch being blurred.
Aggregating the blur probabilities for patches distributed over the input image forms a coarse estimate of segmentation map.
- The training examples come from 500 all-in-focus images containing diverse scenes and textures obtained from Flickr dataset and 300 sharp images selected from the ILSVRC dataset.
- To simulate motion blur, 1000 realistic motion blur kernels are generated by following the approach described in .
- Optical blur is simulated using Gaussian blur kernels with variance σ². σ is varied from 0.5 to 4 in steps of 0.2.
- Each image I is convolved with N blur kernels randomly selected from our set to generate N blurred images.
- 2×10⁵ image patches of size 30×30 are extracted from random locations in these images, but only those patches for training whose entropy (a measure of textureness) is greater than 4.5.
- This dataset is divided into 80% train and 10% each for both test and validation purpose.
- During testing, this process involves extraction of overlapping patches from the input image and passing them through the network to get their corresponding probabilities.
- This estimated probability is assigned to all the pixels contained in the patch, while averaging the values in the pixels being overlapped by other patches.
1.2. Image Level Classification Network (Top)
- A fully convolutional encoder-decoder network with skip connections is trained, which inspired from Pix2Pix.
- Pix2Pix is originally used for image-to-image translation task, in which the generator has a similar network architecture as U-Net. (If interested, please feel free to read Pix2Pix and U-Net.)
- It is trained using the dataset provided in Park CVPR’17.
- It is noted that when the training data is limited in size, inclusion of adversarial loss term ensures that the estimated segmentation-map is both semantically meaningful and close to the ground truth.
- The original L1 loss in Pix2Pix, is replaced by binary cross entropy loss as the targets are binary images, with the final layer as Sigmoid layer.
- The number of filters to is fixed as 64 throughout the network instead of doubling when downsizing since the task does not require learning of very deep hierarchical features.
- The dataset in Park CVPR’17 is used, i.e. 1000 images with human labelled blur regions, among which 296 are partially motion-blurred and 704 are defocus-blurred.
- It is divided into training and test sets (90%-10%).
1.3. Final Detection and Segmentation
- The outputs of the two networks have complimentary properties.
- Pixel-wise multiplication is performed: b(p) = b1(p) * b2(p).
The patch-level network is able to detect blurred regions with sufficient accuracy. However, sharp but homogeneous regions are also flagged as blurred in this map and the edges are not aligned.
On the other hand, the image level regression network’s map contains refined edges and is able to correctly classify homogeneous regions, but sometimes misclassifies blurred regions as sharp too.
- The probabilities b(p) are fed to an MRF for post-processing to obtain the final result.
2. Experimental Results
2.1. Qualitative Comparison
- The 100 test images in Park CVPR’17 are used for testing.
- The segmentation results returned by the proposed algorithm are very close to the ground truth. They are significantly more accurate than prior art.
2.2. Quantitative Comparison
- For binary segmentation, A simple thresholding method is applied to the defocused maps.
- It is observed that the proposed approach can effectively segment the image into defocused and focused regions.