Brief Review — PP-YOLOv2: A Practical Object Detector

Improves PP-YOLO by Advancing Training Strategies

Sik-Ho Tsang
3 min readJun 2, 2024
Comparison of the proposed PP-YOLOv2 and other object detectors

PP-YOLOv2: A Practical Object Detector
, by Baidu Inc., Waseda University
2021 arXiv v1, Over 130 Citations (Sik-Ho Tsang @ Medium)

Object Detection
20142021 [Scaled-YOLOv4] [PVT, PVTv1] [Deformable DETR] [HRNetV2, HRNetV2p] [MDETR] [TPH-YOLOv5] 2022 [Pix2Seq] [MViTv2] [SF-YOLOv5] [GLIP] [TPH-YOLOv5++] [YOLOv6] 2023 [YOLOv7] [YOLOv8] 2024 [YOLOv9]
==== My Other Paper Readings Are Also Over Here ====

  • By combining multiple effective refinements, PP-YOLO’s performance is boosted as PP-YOLOv2.


  1. PP-YOLOv2
  2. Results

1. PP-YOLOv2


1.1. What Strategies Help

  • Path Aggregation Network (PAN): In PP-YOLO, FPN is used. PP-YOLOv2 uses a more advanced PAN.
  • Mish Activation: YOLOv4 and YOLOv5 uses Mish as activation function. To keep the backbone unchanged for using pretrained backbone, Mish activation function is applied in the detection neck instead of the backbone.
  • Larger Input Size: The batch size is reduced from 24 images per GPU to 12 images per GPU to expand the largest input size from 608 to 768.
  • IoU Aware Branch: A soft label format is used for IoU aware loss:
  • where t indicates the IoU between the anchor and its matched ground-truth bounding box, p is the raw output of IoU aware branch, σ refers to the sigmoid function. Only positive samples’ IoU aware loss is computed.

1.2. What Strategies Do Not Help

  • Authors also tried some other tricks but found out that they ain’t help:
  • Cosine Learning Rate Decay;
  • Backbone Parameter Freezing;
  • SiLU Activation.

2. Results

2.1. Ablation Study

Ablation Study
  • Estimated time and FPS does not include result decoder and NMS following YOLOv4.

The mAP of model E, which includes all strategies increases to 49.1% without any loss of efficiency.

2.2. SOTA Comparsions

SOTA Comparsions
  • With a similar FPS, PP-YOLOv2 outperforms YOLOv4-CSP by 2% mAP and surpasses YOLOv5l by 1.3% mAP.
  • Besides, when replacing PP-YOLOv2’s backbone from ResNet50 to ResNet101, PP-YOLOv2 achieves comparable performance with YOLOv5x while it is 15.9% faster than YOLOv5x.

Overall, compared with other state-of-the-art methods, PP-YOLOv2 has certain advantages in the balance of speed and accuracy.

  • As PP-YOLOv2 is implemented using PaddlePaddle (PP), adapting TensorRT for PP-YOLOv2 is much easier than other detectors. Specifically, the Paddle inference engine with TensorRT, FP16-precision, and batch size = 1 further improves PP-YOLOv2’s infer speed. The speed-up ratios for PP-YOLOv2(R50) and PP-YOLOv2(R101) are 54.6% and 73%, respectively.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.