Brief Review — YOLOX: Exceeding YOLO Series in 2021

Anchor-Free YOLO, Outperforms and

Sik-Ho Tsang
3 min readMar 25, 2024

--

YOLOX ()


YOLOX
, by Megvii Technology
2021 arXiv v2, Over 3300 Citations (Sik-Ho Tsang @ Medium)

Object Detection
20142021 [] [] [] [] [] [] 2022 [] [] [] [] [] 2023 []

  • YOLOX is proposed by switching the YOLO detector to an anchor-free manner and applying other advanced detection techniques, i.e., a decoupled head and the leading label assignment strategy SimOTA.

Outline

  1. YOLOX
  2. Results

1. YOLOX

1.1. Decoupled Head

Decoupled Head
  • is used as basedline. Originally, a single head in is used to predict the classification, regression and objectness.

In YOLOX, decoupled head is proposed. It contains a 1×1 conv layer to reduce the channel dimension, followed by two parallel branches with two 3×3 conv layers respectively.

  • The lite decoupled head brings additional 1.1 ms (11.6 ms v.s. 10.5 ms).
Decoupled head improves the converging speed
  • Decoupled head also greatly improves the converging speed.

1.2. Strong Data Augmentation

  • Mosaic in , and are added for data augmentation.
  • For small model, is removed and mosaic is weaken.

1.3. Anchor-Free

  • Originally, clustered anchors are used, which are domain-specific and less generalized. Also, anchor mechanism increases the complexity of detection heads, as well as the number of predictions for each image.

Anchor-free mechanism significantly reduces the number of design parameters. The predictions for each location are reduced from 3 to 1 and they are directly used for predicting 4 values, i.e., 2 offsets in terms of the left-top corner of the grid, and the height and width of the predicted box.

  • The center location of each object is assigned as the positive sample and a scale range is pre-defined to designate the level for each object.
  • The center 3×3 is assigned as multi-positive.

1.4. SimOTA

  • 4 key insights are concluded for an advanced label assignment: 1). loss/quality aware, 2). center prior, 3). dynamic number of positive anchors for each ground-truth (abbreviated as dynamic top-k), 4). global view.
  • SimOTA first calculates pair-wise matching degree, represented by cost
  • In SimOTA, the cost between groundtruth gi and prediction pj is calculated as:
  • where Lclsij and Lregij are classficiation loss and regression loss.

For groundtruth gi, YOLOX selects the top k predictions with the least cost within a fixed center region as its positive samples. Finally, the corresponding grids of those positive predictions are assigned as positives, while the rest grids are negatives.

  • SimOTA raises the detector from 45.0% AP to 47.3% AP.
Component Increment
  • The corresponding increment of each component is as shown above.

2. Results

YOLOX, developed from , outperforms .

YOLOX-Nano, even smaller model, is developed.

SOTA Comparisons
SOTA Comparisons

YOLOX outperforms , , and .

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.

Responses (1)

Write a response