Review — E-Net+B-Net: Deep Multi-Scale Feature Learning for Defocus Blur Estimation (Blur Detection)

Outperforms DMENet & Park CVPR’17 / DHCF / DHDE

Sik-Ho Tsang
4 min readJan 17, 2021


Overview of the deep defocus blur estimation method

In this story, Deep Multi-Scale Feature Learning for Defocus Blur Estimation, E-Net+B-Net, by Trinity College Dublin, and Federal University of Rio Grande do Sul., is briefly reviewed. In this paper:

  • A Convolutional Neural Network (CNN) is designed to jointly tackles:
  1. E-Net: The discrimination of depth edges (i.e., edges that lie at depth discontinuities) from pattern edges (i.e., edges that lie at relatively constant depth values).
  2. B-Net: multi-scale blur estimation for pattern edges.
  • A fast edge-aware guided filter to propagate blur information estimated at pattern edge points to homogeneous regions, at the same time penalizing the propagation over depth edges.

This is a paper in 2020 arXiv. (Sik-Ho Tsang @ Medium)


  1. B-Net: Multi-scale Blur Estimation
  2. E-Net: Discrimination of Depth Edges from Pattern Edges
  3. Experimental Results

1. B-Net: Multi-scale Blur Estimation

Overview of the feature extraction networks for depth and pattern edge separation and defocus blur estimation
  • B-Net is fed with image patches centered at pattern edges.
  • It consists of two cascaded sub-networks: f1-NET (green shaded box) and b-NET (yellow shaded box). Sub-network f1-NET receives three patches of different sizes (PB1 = 41×41, PB2 = 27×27 and PB3 = 15×15).
  • These multi-scale patches, after a series of convolutional filters, ReLUs and max pooling layers, are concatenated at the point where they reach the same spatial size.
  • The output of f1-NET is then fed to b-NET. The goal of b-NET is to extract deep features fB specialized to encode the blur level of the multiscale patches.
  • Then fB is fed to Classification Network I, which consists of 3 fully connected layers, through two hidden layers (300 and 150 nodes of each) and one output layer with 23 nodes, to quantify the blur level into 23 levels. The last layer is softmax layer.

2. E-Net: Discrimination of Depth Edges from Pattern Edges

2.1. E-Net

  • E-Net includes a separate branch that is fed with a fixed-sized patch (PE1 = 41×41) that aims to extract features tailored to the edge classification problem.
  • This branch is called sub-network f2-NET and is shown in a purple shaded box.
  • The outputs of f1-NET (low-level blur information) and f2-NET (edge classification features) are then fused together.
  • Fused information is then sent to e-NET, which consists of a set of convolutional layers with ReLU activation functions to extract deep features fE specialized for pattern and depth edge classification.
  • Finally, fE are sent to a classification network called “Classification Network II”, which consists of 2 fully connected hidden layers of size 300 and 150 nodes each with ReLU activations.
  • The output layer presents only two nodes (pattern or depth edge) with softmax activation.

2.2. Full Blur Map

  • A Domain Transform (DT) filter used in [36] is used with modification, that adds a penalty to depth edges.
  • It propagates blur information estimated at pattern edge points to homogeneous regions, at the same time penalizing the propagation over depth edges.
  • (Since it is not deep network, I don’t focus it too much. Also, it makes the approach not end-to-end.)

3. Experimental Results

3.1. Synthetic Data

  • To train B-Net, 250 images from ILSVRC and 250 images from MS-COCO are used to generate a full blurry images, with disk kernels.
  • Although blurring the whole image with a single spatially-invariant blur kernel is clearly an oversimplification since it does not impose any blur variations due to depth changes, this approach has generalized well to the blur estimation problem.
  • To train E-Net, 200 salient regions as foreground objects from the HKU-IS are used. 100 images from ILSVRC and 100 images from MS-COCO to compose the background.
  • They are alpha-blended with 4 levels of blur scale.
  • B-Net is trained first, then E-Net, though they have weight-sharing layers. Cross-entropy loss is used.

3.2. Validation

Depth and pattern edge separation via E-Net
  • In the first blurry image, three different depth layers can be seen (from left to right), and E-Net manages to distinguish most of the edge points that present depth discontinuities (abrupt blur change).
  • In the second image, there is an abrupt depth transition from the red wall to the background, and E-Net correctly labels these boundary points as depth edges, with a few false negatives.
Quantitative evaluation of blur maps for the dataset provided in [21]
  • The proposed method outperforms all the competitive approaches, including the recent end-to-end method [28], i.e. DMENet, also Park CVPR’17 / DHCF / DHDE [3].
  • Although DMENet has the fastest execution speed, the proposed method presents a very good compromise between MAE and running time when compared to other SOTA methods.
  • (There are still other ablation experiments. If interested, please feel free to read the paper.)



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.