Brief Review — PP-YOLOv2: A Practical Object Detector
Improves PP-YOLO by Advancing Training Strategies
PP-YOLOv2: A Practical Object Detector
PP-YOLOv2, by Baidu Inc., Waseda University
2021 arXiv v1, Over 130 Citations (Sik-Ho Tsang @ Medium)Object Detection
2014 … 2021 [Scaled-YOLOv4] [PVT, PVTv1] [Deformable DETR] [HRNetV2, HRNetV2p] [MDETR] [TPH-YOLOv5] 2022 [Pix2Seq] [MViTv2] [SF-YOLOv5] [GLIP] [TPH-YOLOv5++] [YOLOv6] 2023 [YOLOv7] [YOLOv8] 2024 [YOLOv9]
==== My Other Paper Readings Are Also Over Here ====
- By combining multiple effective refinements, PP-YOLO’s performance is boosted as PP-YOLOv2.
Outline
- PP-YOLOv2
- Results
1. PP-YOLOv2
1.1. What Strategies Help
- Path Aggregation Network (PAN): In PP-YOLO, FPN is used. PP-YOLOv2 uses a more advanced PAN.
- Mish Activation: YOLOv4 and YOLOv5 uses Mish as activation function. To keep the backbone unchanged for using pretrained backbone, Mish activation function is applied in the detection neck instead of the backbone.
- Larger Input Size: The batch size is reduced from 24 images per GPU to 12 images per GPU to expand the largest input size from 608 to 768.
- IoU Aware Branch: A soft label format is used for IoU aware loss:
- where t indicates the IoU between the anchor and its matched ground-truth bounding box, p is the raw output of IoU aware branch, σ refers to the sigmoid function. Only positive samples’ IoU aware loss is computed.
1.2. What Strategies Do Not Help
- Authors also tried some other tricks but found out that they ain’t help:
- Cosine Learning Rate Decay;
- Backbone Parameter Freezing;
- SiLU Activation.
2. Results
2.1. Ablation Study
- Estimated time and FPS does not include result decoder and NMS following YOLOv4.
The mAP of model E, which includes all strategies increases to 49.1% without any loss of efficiency.
2.2. SOTA Comparsions
- With a similar FPS, PP-YOLOv2 outperforms YOLOv4-CSP by 2% mAP and surpasses YOLOv5l by 1.3% mAP.
- Besides, when replacing PP-YOLOv2’s backbone from ResNet50 to ResNet101, PP-YOLOv2 achieves comparable performance with YOLOv5x while it is 15.9% faster than YOLOv5x.
Overall, compared with other state-of-the-art methods, PP-YOLOv2 has certain advantages in the balance of speed and accuracy.
- As PP-YOLOv2 is implemented using PaddlePaddle (PP), adapting TensorRT for PP-YOLOv2 is much easier than other detectors. Specifically, the Paddle inference engine with TensorRT, FP16-precision, and batch size = 1 further improves PP-YOLOv2’s infer speed. The speed-up ratios for PP-YOLOv2(R50) and PP-YOLOv2(R101) are 54.6% and 73%, respectively.