Review: YOLOv8 (Object Detection)
YOLOv8, 2023, by ultralytics (Sik-Ho Tsang @ Medium)
Object Detection
2014 … 2021 [Scaled-YOLOv4] [PVT, PVTv1] [Deformable DETR] [HRNetV2, HRNetV2p] [MDETR] [TPH-YOLOv5] 2022 [Pix2Seq] [MViTv2] [SF-YOLOv5] [GLIP] [TPH-YOLOv5++] [YOLOv6] 2023 [YOLOv7]
==== My Other Paper Readings Are Also Over Here ====
- While there is no paper or document descirbing YOLOv8 model architecture and training strategies so far, I started to find some other papers utilizing YOLOv8 for transfer learning, see if they have mentioned a little more about what kinds of new stuffs or technologies have been involved in YOLOv8 compared with the previous YOLOs.
- After all of this, I found a paper named: Real-Time Flying Object Detection with YOLOv8. It mentioned about some differences between YOLOv8 and YOLOv5.
- There is also a GitHub by mmyolo having a model diagram of YOLOv8 architecture.
Outline
- YOLOv8
- Results
1. YOLOv8
- (Since it is difficult to go into details without any papers/docs, if there is something wrong, please feel free to tell me.)
- Authors in Real-Time Flying Object Detection with YOLOv8 mention that: YOLOv8 was trained on a blend of the COCO dataset and several other datasets, while YOLOv5 was trained primarily on the COCO dataset.
- For the neck, similar to YOLOv5, YOLOv8 also uses the methods of FPN and PAN.
- For the head, simlar to YOLOv6 and YOLOX, decoupled head is used.
- Similar to YOLOv6, YOLOv8 is also a anchor-free object detector that directly predicts the center of an object instead of the offset from a known anchor box which reduces the number of box predictions, and that speeds up the post processing process.
- YOLOv8 uses Soft-NMS which is a variant of the NMS technique used in YOLOv5. Soft-NMS applies a soft threshold to the overlapping bounding boxes instead of discarding them outright.
- The loss function in YOLOv8 is:
- This loss function includes the CIoU (complete IoU) loss proposed by Zheng et al. [22] as the box loss, the standard binary cross entropy for multi-label classification as the classification loss (allowing each cell to predict more than 1 class), and the distribution focal loss proposed by Li et al. [10] as the 3rd term.