Review — CENet: Enhancing Diversity of Defocus Blur Detectors via Cross-Ensemble Network (Blur Detection)
Ensemble Network With Enhancing Diversity, Outperforms BTBNet & Park CVPR’17 / DHCF
In this story, Enhancing Diversity of Defocus Blur Detectors via Cross-Ensemble Network, CENet, by Dalian University of Technology, is reviewed. In this paper:
- A novel learning strategy by breaking DBD problem into multiple smaller defocus blur detectors and thus estimate errors can cancel out each other.
- Cross-negative and self-negative correlations and an error function are designed to enhance ensemble diversity and balance individual accuracy.
This is a paper in 2019 CVPR with over 15 citations. (Sik-Ho Tsang @ Medium)
Outline
- Problem Formulation
- Single Detector Network (SENet)
- Multi-detector Ensemble Network (MENet)
- Proposed Cross-Ensemble Network (CENet)
- CENet: Network Architecture & Training & Testing
- Experimental Results
1. Problem Formulation
- The above network ensemble multiple small networks. And it can be trained end-to-end. However, the diversity among the small networks are not guaranteed.
- To enhance the diversity, authors propose to introduce a novel error function.
- With different number of detectors, the network models are divided into three categories: single detector network (SENet), multi-detector ensemble network (MENet) and cross-ensemble network (CENet).
- They are described as below in the story.
2. Single Detector Network (SENet)
- SENet is Single Detector Network. (Not Squeeze and Excitation Network, SENet, in image classification.)
- Here, it uses a parameterized detector f to find the set of parameters w that minimizes the expected mean squared error.
- Due to the lack of diversity, SENet can hardly achieve optimal results.
3. Multi-detector Ensemble Network (MENet)
- MENet contains a group of them: F = {f1, f2, …, fK}, where each fk has its own parameter vector wk, and K is the total number of detectors.
- This group of detectors can be trained jointly.
- If it is a uniform weighted average, then it is:
- Treating the ensemble ^f as a single learning unit, the bias-variance decomposition is:
- where the shorthand expectation operator E{·} is used to represent the generalization ability.
- With the above two equations, it becomes:
- With some arrangements:
- where the first item is the weighted average error of the individuals, and the second item measures the amount of correlation between the ensemble and each individual.
- Based on the above equation, a group of detectors is trained with each objective loss:
- where non-negative weight λ expresses the trade-off between these two items.
- The second term in the above equation penalizes the correlation of each detector with others to make better trade-offs among the accuracy and diversity for reducing the overall loss function.
Although MENet improves the detection accuracy over SENet, it has limit when the input image has small-scale focused area or large-scale homogeneous regions.
The main reason is that MENet does not effectively encourage these detectors diversity.
4. Proposed Cross-Ensemble Network (CENet)
- CENet constructs with two groups of defocus blur detectors, F’ = {f’1, f2, …, fK} and F’’ = {f’’1, f’’2 , …, f’’K}.
Each detector is not only negatively correlated with the other detectors of the current group, but also with the ones of the other one group.
- For individual detector in the first group of detectors:
- The first item is to assure accuracy.
- The second item aims to enhance diversity of the the current group.
- The third item focuses on improving diversity with the other one group.
- The loss of each detector in the second group is similar to the first group of detectors one:
- The two groups of detectors are alternately optimized to enhance diversity.
5. CENet: Network Architecture & Training & Testing
5.1. Network Architecture
- VGG16 is employed.
- The proposed CENet, which includes two networks, FENet and DBD-CENet.
- A feature-shared FENet is designed to extract low-level features.
- FENet is constructed with the first two convolutional blocks (CB_1 and CB_2) of VGG16.
- In DBE-CENet, each branch consists of last three fully convolutional blocks of CB_3, CB_4 and CB_5 (or CB’_3, CB’_4 and CB’_5) of VGG16.
- Then it is followed by a convolution layer with K channels to convert 512 channels to K DBD detectors. This convolution block as detector generation layer (DGL).
- Finally, the DBD map will be obtained by combining the defocus blur detectors produced by the two branches.
- To achieve fair comparison with CENet, SENet and MENet are designed for the same capacity as CENet by doubling the channels in last three fully convolutional blocks of VGG16.
5.2. Training
- First train FENet and one branch of DBD-CENet with pretrained parameters of VGG16 network on ImageNet.
- Then, fix FENet and initialize the other branch with the trained parameters of the first branch.
- Finally, finetune the two branches of DBD-CENet in an iterative way with every epoch.
5.3. Testing
- In the test stage, we compute the final DBD map using the two groups of detectors according to the formation:
- Each map is binarized with an adaptive threshold, which is 1.5 times the mean value of the DBD map.
- MAE, F-measure, and Precision-Recall (PR) curves are used for evaluation.
6. Experimental Results
6.1. Effectiveness of parameters γ and λ
- 604 images of CUHK blur dataset used for training and the remaining 100 images and DUT dataset for testing. (It seems that there is no validation set for ablation study.)
- A larger γ will encourage cross diversity and a larger λ will encourage self diversity.
- But too large γ and λ will reduce individual accuracy.
- When λ to 0, γ = 0.01 can produce the better results.
- Then, γ is set to 0.01, and adjust the parameter λ. λ = 0.1 achieves the best results on both datasets.
6.2. Selection of parameter K
- Considering model complexity and computational efficiency, authors take K to 64, which has achieved the state of the art.
6.3. Visual Comparison of SENet, MENet, and CENet
- CENet consistently produces DBD maps closest to the ground truth.
6.4. Visual Comparison With SOTA Approaches
- It can be seen that CENet highlights focused area the most uniformly and produces the sharpest boundaries on the transition region.
- CENet has the best results for different scale focused area detection. (e.g., the small scale and large scale ones in the second and third rows)
- In the fifth row of the figure, almost all methods produce noise because of the cluttered background and homogeneous region except for CENet.
6.5. Quantitative Results
- Comparing F-measure scores, CENet outperforms the second best method, BTBNet, by 2.9% and 5.2% over DUT and CUHK respectively.
- Moreover, CENet lowers the MAE scores significantly on both datasets.
- Meanwhile, CENet is also highly efficient with the speed of 15.63 FPS, which is 6.2 times faster than the second fastest method SS.
- PR-curves and F-measure scores are displayed in the above two figures.
- CENet performs favorably against other methods on both datasets.
Reference
[2019 CVPR] [CENet]
Enhancing Diversity of Defocus Blur Detectors via Cross-Ensemble Network
Blur Detection / Defocus Map Estimation
2017 [Park CVPR’17 / DHCF / DHDE] 2018 [Purohit ICIP’18] [BDNet] [DBM] [BTBNet] 2019 [Khajuria ICIIP’19] [Zeng TIP’19] [PM-Net] [CENet] 2020 [BTBCRL (BTBNet + CRLNet)]