Brief Review — TOOD: Task-aligned One-stage Object Detection

With Task Alignment Learning (TAL), TOOD Outperforms &

Sik-Ho Tsang
4 min readMar 30, 2024

--

(Top); TOOD (Bottom) Obtains More Consistent Classification Score Map and IoU Map


TOOD
, by Intellifusion Inc., Meituan Inc., ByteDance Inc., Malong LLC, and Alibaba Group
2021 ICCV, Over 460 Citations (Sik-Ho Tsang @ Medium)

Object Detection
20142021 [] [] [] [] [] [] 2022 [] [] [] [] [] 2023 []

  • Task-aligned One-stage Object Detection (TOOD) designs a novel Task-aligned Head (T-Head) which offers a better balance between learning task-interactive and task-specific features, as well as a greater flexibility to learn the alignment via a task-aligned predictor.
  • Task Alignment Learning (TAL) is proposed to explicitly pull closer (or even unify) the optimal anchors for the two tasks during training via a designed sample assignment scheme and a task-aligned loss.

Outline

  1. Task-aligned One-stage Object Detection (TOOD)
  2. Results

1. Task-aligned One-stage Object Detection (TOOD)

1.1. Overall Framework

  • T-head and TAL work collaboratively to improve the alignment of classification and localization tasks.
  • Specifically, T-head first makes predictions for the classification and localization on the features.
  • Then TAL computes task alignment signals based on a new task alignment metric which measures the degree of alignment between the two predictions.
  • Lastly, T-head automatically adjusts its classification probabilities and localization predictions using learning signals computed from TAL during back propagation.

1.2. T-Head

T-Head
  • TOOD has an overall pipeline of ‘backbone--head’.
  • Existing one-stage detectors have limitations of task misalignment between classification and localization, due to the divergence of two tasks as parallel heads are used.
  • T-head uses a simple feature extractor with two Task-Aligned Predictors (TAP) where there is task aligment in TAP.
  • TAP converts feature maps into dense classification scores P, or object bounding boxes B.
  • A probability map M which is computed from the interactive features, is used to adjust the classification prediction P to P^align:
  • A spatial offset maps O is also further learned. The alignment maps M and O are learned automatically from the stack of interactive features:

1.3. Task Alignment Learning (TAL)

  • A metric is designed to compute anchor-level alignment for each instance:
  • where s and u denote a classification score and an IoU value, respectively.
  • Notably, t plays a critical role in the joint optimization of the two tasks towards the goal of task-alignment.
  • For training sample assignment, for each instance, m anchors are assigned having the largest t values as positive samples, while the remaining anchors are used as negative ones. Again, the training is performed by computing new loss functions.
  • Binary Cross Entropy (BCE) is computed on the positive anchors for the classification task:
  • The computed on the positive anchors can be obtained by reformulating the above equation, and the final loss function for the classification task:
  • The loss of bounding box regression computed for each anchor based on ^t is re-weighted, and a can be reformulated as follows:
  • The total training loss for TAL is the sum of Lcls and Lreg.

2. Results

T-Head Performance

T-Head has better performance than the independent parallel heads.

TAL with the use of TAP has better performance than other training schemes.

SOTA Comparisons

TOOD obtains the best performance, e.g.: outperforms , .

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.

Responses (1)

Write a response