Review — CENet: Enhancing Diversity of Defocus Blur Detectors via Cross-Ensemble Network (Blur Detection)

Ensemble Network With Enhancing Diversity, Outperforms BTBNet & Park CVPR’17 / DHCF

7 min readJan 4, 2021

In this story, Enhancing Diversity of Defocus Blur Detectors via Cross-Ensemble Network, CENet, by Dalian University of Technology, is reviewed. In this paper:

A novel learning strategy by breaking DBD problem into multiple smaller defocus blur detectors and thus estimate errors can cancel out each other.
Cross-negative and self-negative correlations and an error function are designed to enhance ensemble diversity and balance individual accuracy.

This is a paper in 2019 CVPR with over 15 citations. (Sik-Ho Tsang @ Medium)

Outline

Problem Formulation
Single Detector Network (SENet)
Multi-detector Ensemble Network (MENet)
Proposed Cross-Ensemble Network (CENet)
CENet: Network Architecture & Training & Testing
Experimental Results

1. Problem Formulation

The above network ensemble multiple small networks. And it can be trained end-to-end. However, the diversity among the small networks are not guaranteed.

To enhance the diversity, authors propose to introduce a novel error function.
With different number of detectors, the network models are divided into three categories: single detector network (SENet), multi-detector ensemble network (MENet) and cross-ensemble network (CENet).
They are described as below in the story.

2. Single Detector Network (SENet)

SENet is Single Detector Network. (Not Squeeze and Excitation Network, SENet, in image classification.)
Here, it uses a parameterized detector f to find the set of parameters w that minimizes the expected mean squared error.

Due to the lack of diversity, SENet can hardly achieve optimal results.

3. Multi-detector Ensemble Network (MENet)

MENet contains a group of them: F = {f1, f2, …, fK}, where each fk has its own parameter vector wk, and K is the total number of detectors.
This group of detectors can be trained jointly.
If it is a uniform weighted average, then it is:

Treating the ensemble ^f as a single learning unit, the bias-variance decomposition is:

where the shorthand expectation operator E{·} is used to represent the generalization ability.
With the above two equations, it becomes:

With some arrangements:

where the first item is the weighted average error of the individuals, and the second item measures the amount of correlation between the ensemble and each individual.
Based on the above equation, a group of detectors is trained with each objective loss:

where non-negative weight λ expresses the trade-off between these two items.
The second term in the above equation penalizes the correlation of each detector with others to make better trade-offs among the accuracy and diversity for reducing the overall loss function.

Although MENet improves the detection accuracy over SENet, it has limit when the input image has small-scale focused area or large-scale homogeneous regions.
The main reason is that MENet does not effectively encourage these detectors diversity.

4. Proposed Cross-Ensemble Network (CENet)

CENet constructs with two groups of defocus blur detectors, F’ = {f’1, f2, …, fK} and F’’ = {f’’1, f’’2 , …, f’’K}.

Each detector is not only negatively correlated with the other detectors of the current group, but also with the ones of the other one group.

For individual detector in the first group of detectors:

The first item is to assure accuracy.
The second item aims to enhance diversity of the the current group.
The third item focuses on improving diversity with the other one group.
The loss of each detector in the second group is similar to the first group of detectors one:

The two groups of detectors are alternately optimized to enhance diversity.

5. CENet: Network Architecture & Training & Testing

5.1. Network Architecture

VGG16 is employed.
The proposed CENet, which includes two networks, FENet and DBD-CENet.
A feature-shared FENet is designed to extract low-level features.
FENet is constructed with the first two convolutional blocks (CB_1 and CB_2) of VGG16.
In DBE-CENet, each branch consists of last three fully convolutional blocks of CB_3, CB_4 and CB_5 (or CB’_3, CB’_4 and CB’_5) of VGG16.
Then it is followed by a convolution layer with K channels to convert 512 channels to K DBD detectors. This convolution block as detector generation layer (DGL).
Finally, the DBD map will be obtained by combining the defocus blur detectors produced by the two branches.
To achieve fair comparison with CENet, SENet and MENet are designed for the same capacity as CENet by doubling the channels in last three fully convolutional blocks of VGG16.

5.2. Training

First train FENet and one branch of DBD-CENet with pretrained parameters of VGG16 network on ImageNet.
Then, fix FENet and initialize the other branch with the trained parameters of the first branch.
Finally, finetune the two branches of DBD-CENet in an iterative way with every epoch.

5.3. Testing

In the test stage, we compute the final DBD map using the two groups of detectors according to the formation:

Each map is binarized with an adaptive threshold, which is 1.5 times the mean value of the DBD map.
MAE, F-measure, and Precision-Recall (PR) curves are used for evaluation.

6. Experimental Results

6.1. Effectiveness of parameters γ and λ

**Effectiveness of parameters γ and λ on both DUT and CUHK datasets.**

604 images of CUHK blur dataset used for training and the remaining 100 images and DUT dataset for testing. (It seems that there is no validation set for ablation study.)
A larger γ will encourage cross diversity and a larger λ will encourage self diversity.
But too large γ and λ will reduce individual accuracy.
When λ to 0, γ = 0.01 can produce the better results.
Then, γ is set to 0.01, and adjust the parameter λ. λ = 0.1 achieves the best results on both datasets.

6.2. Selection of parameter K

**Effectiveness of parameter K on both DUT and CUHK datasets.**

Considering model complexity and computational efficiency, authors take K to 64, which has achieved the state of the art.

6.3. Visual Comparison of SENet, MENet, and CENet

**Comparison of DBD Maps: From Top to Bottom, Images, SENet, MENet, CENet and Ground Truth**

CENet consistently produces DBD maps closest to the ground truth.

6.4. Visual Comparison With SOTA Approaches

**Visual comparison of DBD maps. First four from DUT dataset. Last four from CUHK dataset.**

It can be seen that CENet highlights focused area the most uniformly and produces the sharpest boundaries on the transition region.
CENet has the best results for different scale focused area detection. (e.g., the small scale and large scale ones in the second and third rows)
In the fifth row of the figure, almost all methods produce noise because of the cluttered background and homogeneous region except for CENet.

6.5. Quantitative Results

**Quantitative comparison of F-measure and MAE scores**

Comparing F-measure scores, CENet outperforms the second best method, BTBNet, by 2.9% and 5.2% over DUT and CUHK respectively.
Moreover, CENet lowers the MAE scores significantly on both datasets.
Meanwhile, CENet is also highly efficient with the speed of 15.63 FPS, which is 6.2 times faster than the second fastest method SS.