Brief Review — PP-YOLOv2: A Practical Object Detector

Improves PP-YOLO by Advancing Training Strategies

3 min readJun 2, 2024

**Comparison of the proposed PP-YOLOv2 and other object detectors**

PP-YOLOv2: A Practical Object Detector
PP-YOLOv2, by Baidu Inc., Waseda University
2021 arXiv v1, Over 130 Citations (
Sik-Ho Tsang
@ Medium)
Object Detection
2014 … 2021 [Scaled-YOLOv4] [PVT, PVTv1] [Deformable DETR] [HRNetV2, HRNetV2p] [MDETR] [TPH-YOLOv5] 2022 [Pix2Seq] [MViTv2] [SF-YOLOv5] [GLIP] [TPH-YOLOv5++] [YOLOv6] 2023 [YOLOv7] [YOLOv8] 2024 [YOLOv9]
==== My Other Paper Readings Are Also Over Here ====

By combining multiple effective refinements, PP-YOLO’s performance is boosted as PP-YOLOv2.

Outline

PP-YOLOv2
Results

1. PP-YOLOv2

1.1. What Strategies Help

Path Aggregation Network (PAN): In PP-YOLO, FPN is used. PP-YOLOv2 uses a more advanced PAN.
Mish Activation: YOLOv4 and YOLOv5 uses Mish as activation function. To keep the backbone unchanged for using pretrained backbone, Mish activation function is applied in the detection neck instead of the backbone.
Larger Input Size: The batch size is reduced from 24 images per GPU to 12 images per GPU to expand the largest input size from 608 to 768.
IoU Aware Branch: A soft label format is used for IoU aware loss:

where t indicates the IoU between the anchor and its matched ground-truth bounding box, p is the raw output of IoU aware branch, σ refers to the sigmoid function. Only positive samples’ IoU aware loss is computed.

1.2. What Strategies Do Not Help

Authors also tried some other tricks but found out that they ain’t help:
Cosine Learning Rate Decay;
Backbone Parameter Freezing;
SiLU Activation.

2. Results

2.1. Ablation Study

Estimated time and FPS does not include result decoder and NMS following YOLOv4.

The mAP of model E, which includes all strategies increases to 49.1% without any loss of efficiency.

2.2. SOTA Comparsions

With a similar FPS, PP-YOLOv2 outperforms YOLOv4-CSP by 2% mAP and surpasses YOLOv5l by 1.3% mAP.
Besides, when replacing PP-YOLOv2’s backbone from ResNet50 to ResNet101, PP-YOLOv2 achieves comparable performance with YOLOv5x while it is 15.9% faster than YOLOv5x.

Overall, compared with other state-of-the-art methods, PP-YOLOv2 has certain advantages in the balance of speed and accuracy.

As PP-YOLOv2 is implemented using PaddlePaddle (PP), adapting TensorRT for PP-YOLOv2 is much easier than other detectors. Specifically, the Paddle inference engine with TensorRT, FP16-precision, and batch size = 1 further improves PP-YOLOv2’s infer speed. The speed-up ratios for PP-YOLOv2(R50) and PP-YOLOv2(R101) are 54.6% and 73%, respectively.

Brief Review — PP-YOLOv2: A Practical Object Detector

Improves PP-YOLO by Advancing Training Strategies

Outline

1. PP-YOLOv2

1.1. What Strategies Help

1.2. What Strategies Do Not Help

2. Results

2.1. Ablation Study

2.2. SOTA Comparsions

Written by Sik-Ho Tsang