Review — YOLOv4: Optimal Speed and Accuracy of Object Detection

YOLOv4, Using Better backbone CSPNet, Bag of Freebies (BoF) and Bag of Specials (BoS), Outperforms EfficientDet, ASFF, NAS-FPN, CenterNet, CornerNet, etc.

YOLOv4 (YouTube link provided from Author’s Medium, link at the bottom)
YOLOv4 runs twice faster than EfficientDet with comparable performance. Improves YOLOv3’s AP and FPS by 10% and 12%, respectively.

Outline

1. YOLOv4: Network Architecture

YOLOv4: Network Architecture (Figure from https://aiacademy.tw/yolo-v4-intro/)

1.1. Selection of Backbone

1.2. Selection of Additional Blocks

2. Additional Improvements

2.1. Mosaic

Mosaic represents a new method of data augmentation

2.2. Self-Adversarial Training (SAT)

2.3. Cross mini-Batch Normalization (CmBN)

Cross mini-Batch Normalization

2.4. Modified Spatial Attention Module (SAM)

2.5. Modified Path Aggregation Network (PAN)

3. Bag of Freebies (BoF) and Bag of Specials (BoS) for Backbone and Detector

3.1. BoF for Backbone

3.2. BoS for Backbone

3.3. BoF for Detector

3.4. BoS for Detector

4. Ablation Study

4.1. Influence of Different Features on Classifier Training (ImageNet)

Various methods of data augmentation are tested to choose the best one
Mish Activation
Influence of BoF and Mish on the CSPResNeXt-50 classifier accuracy
Influence of BoF and Mish on the CSPDarknet-53 classifier accuracy

4.2. Influence of BoF on Detector Training (MS COCO)

Ablation Studies of Bag-of-Freebies. (CSPResNeXt50-PANet-SPP, 512×512).

4.3. Influence of BoS on Detector Training (MS COCO)

Ablation Studies of Bag-of-Specials. (Size 512×512)

4.4. Influence of Different Backbones and Pretrained Weightings on Detector Training

Using different classifier pre-trained weightings for detector training (all other training parameters are similar in all models).

4.5. Influence of Different Minibatch Size on Detector Training

Influence of different minibatch size on Detector training

5. SOTA Comparison

Comparison of the speed and accuracy of different object detectors Using one GPU of either Maxwell/Pascal/Volta Type
Comparison of the speed and accuracy of different object detectors on the MS COCO dataset (testdev 2017)

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG