Brief Review — YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

YOLOv6, Formed by Adopting Recent Object Detection Advancements from Industry and Academy, Outperforms YOLOv5, YOLOX, YOLOv7

4 min readApr 6, 2024

YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications
YOLOv6, by Meituan Inc.
2022 arXiv v1, Over 1000 Citations (Sik-Ho Tsang @ Medium)
Object Detection
2014 … 2021 [Scaled-YOLOv4] [PVT, PVTv1] [Deformable DETR] [HRNetV2, HRNetV2p] [MDETR] [TPH-YOLOv5] 2022 [Pix2Seq] [MViTv2] [SF-YOLOv5] [GLIP] [TPH-YOLOv5++] 2023 [YOLOv7]
==== My Other Paper Readings Are Also Over Here ====

In this paper, object detection advancements either from industry or academy are heavily assimilated from recent network design, training strategies, testing techniques, quantization and optimization methods.

Outline

YOLOv6
Results

1. YOLOv6

There are bunched of techniques applied to YOLOv6.

1.1. Network Design

Backbone: RepBlock in RepVGG is used as the building block of the small networks. For large models, a more efficient CSP block in CSPNet is revised, as CSPStackRep Block.
Neck: The neck of YOLOv6 adopts PAN topology following YOLOv4 and YOLOv5. The neck is enhanced with RepBlocks or CSPStackRep Blocks to have RepPAN.
Head: The decoupled head, from the idea in YOLOX, is simplified to make it more efficient, called Efficient Decoupled Head.

1.2. Label Assignment

Label assignment is responsible for assigning labels to predefined anchors during the training stage.

Task Alignment Learning (TAL) was first proposed in TOOD, in which a unified metric of classification score and predicted box quality is designed. The IoU is replaced by this metric to assign object labels.

TAL is used as the default label assignment strategy in YOLOv6.

1.3. Loss Functions

The loss function is composed of a classification loss, a box regression loss and an optional object loss.

1.3.1. Classification Loss

VariFocal Loss (VFL) in VariFocalNet, which is based on Focal Loss, is used for the classification loss.

1.3.2. Box Regression Loss

SIoU [8] is applied to YOLOv6-N and YOLOv6-T, while others use GIoU.
Distribution Focal Loss (DFL) [20] is used in YOLOv6-M/L.

1.3.3. Object Loss

Object loss was first proposed in FCOS. It does not bring any improvements unfortunately.

1.4. Industry-handy Improvements

The training duration: is extended from 300 epochs to 400 epochs.
Self-distillation: is used by minimizing the KL-divergence between the prediction of the teacher and the student. The knowledge distillation loss is:

The overall loss function is now formulated as:

Mosaic augmentations (in YOLOv4): are turned off during last epochs.

1.5. Quantization and Deployment

Post-training quantization (PTQ) directly quantizes the model with only a small calibration set.
RepOptimizer [2] is used to have gradient re-parameterization at each optimization step. The re-parameterization blocks of YOLOv6 are reconstructed in this fashion and trained with RepOptimizer to obtain PTQ-friendly weights.

**Channel-wise** **distillation** **in Quantization-aware training (QAT)**

Besides, channel-wise distillation [36] (later as CW Distill) is adapted within the YOLOv6 framework. This is also a self-distillation approach where the teacher network is the student itself in FP32-precision.