Brief Review — YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

YOLOv6, Formed by Adopting Recent Object Detection Advancements from Industry and Academy, Outperforms YOLOv5, YOLOX, YOLOv7

Sik-Ho Tsang
4 min readApr 6, 2024
YOLOv6 (Image from GitHub)

YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications
YOLOv6
, by Meituan Inc.
2022 arXiv v1, Over 1000 Citations (Sik-Ho Tsang @ Medium)

Object Detection
20142021 [Scaled-YOLOv4] [PVT, PVTv1] [Deformable DETR] [HRNetV2, HRNetV2p] [MDETR] [TPH-YOLOv5] 2022 [Pix2Seq] [MViTv2] [SF-YOLOv5] [GLIP] [TPH-YOLOv5++] 2023 [YOLOv7]
==== My Other Paper Readings Are Also Over Here ====

  • In this paper, object detection advancements either from industry or academy are heavily assimilated from recent network design, training strategies, testing techniques, quantization and optimization methods.

Outline

  1. YOLOv6
  2. Results

1. YOLOv6

  • There are bunched of techniques applied to YOLOv6.
YOLOv6 Framework

1.1. Network Design

RepBlock, RepConv, and CSPStackRep

Backbone: RepBlock in RepVGG is used as the building block of the small networks. For large models, a more efficient CSP block in CSPNet is revised, as CSPStackRep Block.

Neck: The neck of YOLOv6 adopts PAN topology following YOLOv4 and YOLOv5. The neck is enhanced with RepBlocks or CSPStackRep Blocks to have RepPAN.

Head: The decoupled head, from the idea in YOLOX, is simplified to make it more efficient, called Efficient Decoupled Head.

1.2. Label Assignment

  • Label assignment is responsible for assigning labels to predefined anchors during the training stage.

Task Alignment Learning (TAL) was first proposed in TOOD, in which a unified metric of classification score and predicted box quality is designed. The IoU is replaced by this metric to assign object labels.

  • TAL is used as the default label assignment strategy in YOLOv6.

1.3. Loss Functions

  • The loss function is composed of a classification loss, a box regression loss and an optional object loss.

1.3.1. Classification Loss

  • VariFocal Loss (VFL) in VariFocalNet, which is based on Focal Loss, is used for the classification loss.

1.3.2. Box Regression Loss

  • SIoU [8] is applied to YOLOv6-N and YOLOv6-T, while others use GIoU.
  • Distribution Focal Loss (DFL) [20] is used in YOLOv6-M/L.

1.3.3. Object Loss

  • Object loss was first proposed in FCOS. It does not bring any improvements unfortunately.

1.4. Industry-handy Improvements

  • The training duration: is extended from 300 epochs to 400 epochs.
  • Self-distillation: is used by minimizing the KL-divergence between the prediction of the teacher and the student. The knowledge distillation loss is:
  • The overall loss function is now formulated as:
  • Mosaic augmentations (in YOLOv4): are turned off during last epochs.

1.5. Quantization and Deployment

RepOptimizer [2]
  • Post-training quantization (PTQ) directly quantizes the model with only a small calibration set.
  • RepOptimizer [2] is used to have gradient re-parameterization at each optimization step. The re-parameterization blocks of YOLOv6 are reconstructed in this fashion and trained with RepOptimizer to obtain PTQ-friendly weights.
Channel-wise distillation in Quantization-aware training (QAT)
  • Besides, channel-wise distillation [36] (later as CW Distill) is adapted within the YOLOv6 framework. This is also a self-distillation approach where the teacher network is the student itself in FP32-precision.

2. Results

AP Against Latency/Throughput
SOTA Comparisons

After applying different techniques, YOLOv6 outperforms YOLOv5, YOLOX, PPYOLOE, and even outperforms YOLOv7.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.