Brief Review — Real-Time Flying Object Detection with YOLOv8

YOLOv8 for Flying Object Detection

Sik-Ho Tsang
4 min readAug 17, 2024
YOLOv8

Real-Time Flying Object Detection with YOLOv8
YOLOv8 for Flying Object Detection, by Georgia Institute of Technology
2023 arXiv v1, Over 290 Citations (Sik-Ho Tsang @ Medium)

Object Detection
2014 … 2023
[YOLOv7] [YOLOv8] [Lite DETR] [YOLOv8 for Helmet Detection] 2024 [YOLOv9] [YOLOv10] [RT-DETR]
==== My Other Paper Readings Are Also Over Here ====

  • First, a generalized YOLOv8 is trained on a data set containing 40 different classes of flying objects, forcing the model to extract abstract feature representations.
  • Then, transfer learning is applied to the trained generalized YOLOv8, with these learned parameters on a data set more representative of “real world” environments (i.e. higher frequency of occlusion, small spatial sizes, rotations, etc.) to generate the refined model.
  • (This tech report introduces YOLOv1, YOLOv5, and YOLOv8 in a very brief way. Please feel free to read the tech report if interested.)

Outline

  1. YOLOv8 Generalized Model
  2. YOLOv8 Refined Model

1. YOLOv8 Generalized Model

1.1. Initial Generalized Model

The initial model is trained on a data set [11] comprised of 15,064 images of various flying objects with an 80% train and 20% validation split. ([11]: new-workspace 0k81p. flying_object_dataset dataset, Mar. 2022.)

  • However, the data set represents a long-tailed distribution with the drone (25.2% of objects), bird (25%), p-airplane (7.9%), and chelicopter (6.3%) classes taking up the majority of the data set (64.4%), suffering from a class imbalance. Published on Roboflow with an unnamed author, this data set was generated in 2022, having been downloaded only 15 times.
  • 10 epochs are used for each hyperparameter set. The best one is chosen based on the validation performance.

After confirming the best hyperparameter set, and training for 163 epochs, an mAP50–95 of 0.685 is achieved and an average inference speed on 1080p videos of 50 fps.

YOLOv8
  • Due to the large class imbalance, poor performance on the validation set was anticipated in the minority classes.
  • Left: In the confusion matrix as shown above, the model is most likely to mis-classify an F-14 as an F-18. This type of mis-classification typically affects classes in categories with low inter-class variance amongst themselves.
  • Right: As shown above, small objects are difficult to detect.

1.2. CAM Visualization

F-14 and F-18 CAM Visualization
  • CAM visualization is also performed on those images.

F14 and F18 feature activation maps are very close to each other, which could be the reason that the model confuses the two.

Small Object CAM Visualization
  • For small objects, more importance being placed on the background and more granular features being detected when deeper and deeper.

In summary, there are three challenging conditions: (1) detecting and classifying extremely small objects, (2) identifying flying objects that blend into their background, and (3) classifying different types of flying objects.

2. YOLOv8 Refined Model

2.1. Transfer Learning

A second data set [1] is utilized to apply transfer learning for the refined model. ([1]: AhmedMohsen. drone-detection-new dataset, Apr. 2022)

  • This “real world“ data set [1] helps as a solid foundation for transfer learning and effectively extracted flying object feature representations.

2.2. Results

After 190 epochs with the weights learned from the generalized model as the initialization, the refined model achieves an mAP50–95 of 0.835 across the plane, helicopter, and drone classes.

2.3. Visualization

Generalized Model vs Refined Model
  • The first of the four images showcases the model’s ability to identify distant birds.
  • In the second image, the model was put to the test against a very small drone that occupied only .026% of the image size while also blending in with its background. The model still resulted in the correct detection and classification of the drone.
  • The third image shows the model’s ability to identify a minute passenger airplane of size 0.063% of the image, which is also blended into its surroundings.
  • Finally, the fourth image features a V22 aircraft, which is an underrepresented class and accounts for only 3.57% of the entire dataset.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.