Review — Khajuria ICIIP’19: Blur Detection in Identity Images (Blur Detection)

Outperforms BRISQUE Using SVM

  • There are 250,000 identity images without ground truth labels. Some are blur. Some are clear.
  • Apart from naturally blurred images, five types of artificial blur are synthesized into the clear images.
  • A CNN is proposed to detect whether the image is blur or clear which giving 98.05% on 113,000 images, and thus outperforming BRISQUE using SVM.

Outline

  1. Data Preparation
  2. Blur Synthesis
  3. Proposed CNN: Network Architecture
  4. Experimental Results

1. Data Preparation

  • The data comprising 250,000 images which was unlabeled.
  • Firstly, a statistical IQA method, BRISQUE is used, which can cursorily rate the images.
  • BRISQUE can extract 36 features for each image.
  • Then, these feature vectors are fed to SVR for regressing a score for each image.
  • Images with lower scores were generally clear. And, images with higher scores were blur.
  • By observation, 2 threshold values i.e. 40 and 75 are identified.
  • The images with scores less than 40 were labeled clear.
  • The images with score greater than 75 were labeled blur.
  • Only 26% of images fall in mentioned two class intervals i.e. score less than 40 and more than 75. (Rest image samples which includes nearly 74% of data are still ambiguous.)

2. Blur Synthesis

  • The degradation in an image can be mathematically modeled as:
  • where g(x, y), f(x, y), h(x, y) and n(x, y) represents degraded image, original image, point spread function (PSF) and noise respectively.
  • The additive noise n(x, y) is ignored in this paper.
  • 5 variants of PSF are applied.
  • For each PSF, 2 different kernel sizes (K) i.e K1 and K2 are chosen.
  • Thus, 10 kinds of blur are synthesized.

2.1. Gaussian Blur

  • with K1=11 and K2=13.
Gaussian Blur

2.2. Out Of Focus Blur

  • where R is the “radius” of blur and C is the “center” of PSF.
  • K1=7 and K2 = 9.
Out Of Focus Blur

2.3. Horizontal Motion Blur

  • This blur is induced in horizontal motion using the horizontal motion filter, with the size of K1 = 7 and K2 = 9.
Horizontal Motion Blur

2.4. Vertical Motion Blur

  • This blur is induced in vertical motion using the vertical motion filter, with the size of K1 = 9 and K2 = 11.
Vertical Motion Blur

2.5. Box Blur

  • This is the average filter for smoothing the image with K1 = 7 and K2 = 9.
  • val(x, y) gives the pixel value for kernel.
Box Blur

2.6. Data Augmentation

  • The clear images (66163) are divided into 10 parts for applying each blur type. (10 types in total)
  • This results in 6616 images per degradation type.
  • Now, there are 67170 blur images (66160 artificial blur images and 1010 natural blur images) and 66163 belongs to clear category.
  • Cumulatively, we have 133,333 images.
  • To train the CNN, images are resized to a standard size 320×240 using bicubic.

3. Proposed CNN: Network Architecture

Proposed CNN: Network Architecture
  • The first convolution layer takes 320×240×3 as input image.
  • There are 32 kernels which has size of 3×3 and strides of 1.
  • It outputs 318×238×32, then ReLU.
  • Max pooling which has size 2×2 with 2 pixels strides is used, and outputs 159×119×32.
  • Dropout with drop rate of 0.25 for the first convolution layer i.e. 25% of these connections are dropped.
  • The second convolutional layer has 64 kernels which has size of 3×3 and strides of 1. Dropout with drop-rate of 0.5 is applied.
  • It outputs 157×117×64, then ReLU.
  • Max pooling which has size 2×2 with 2 pixels strides is used, and outputs 78×58×64.
  • Then it is fed to a FC layer, consists of 250 units with ReLU and Dropout with drop rate of 0.5.
  • The output of FC layers are input to a Softmax function.
Details of the Proposed CNN
  • There are 72,404,144 parameters to train, as shown in the table above, which is economical compared to many SOTA architectures.

4. Experimental Results

Dataset
  • The data used for training is stratified sampled from whole i.e. 133,333 images. Around 12% and 3% data are used for training and validation i.e around 15,000 and 5,000 images respectively.
  • Thus, remaining 113,333 images were used for testing.
Performance Comparison
  • BRISQUE: This feature vector is of 36 dimension. On these extracted features, Linear Support-Vector Classifier is trained for classification. Here, the classifier reported 94.51% on test samples i.e. 113,333 images.
  • The proposed CNN model is tested over 113,333 images and confirmed an accuracy of 98.05%.
Confusion Matrix of BRISQUE using SVM
  • From above table, the miss classification for class ’blur’ is more than 9%.
Confusion Matrix of Proposed CNN Model
  • Compared to this, the proposed approach misclassifies less than 3% of the time for class ’blur’.

--

--

PhD, Researcher. I share what I learn. :) Reads: https://bit.ly/33TDhxG, LinkedIn: https://www.linkedin.com/in/sh-tsang/, Twitter: https://twitter.com/SHTsang3

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store