Review — Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet

Simply Replacing Self-Attention Blocks by Feed Forward Layers

  • While it is suggested that “Attention is All You Need”, author asks us about “Do You Even Need Attention?”.
  • This is a short report in arXiv to show a strong result obtained by Simply Replacing Self-Attention Blocks by Feed Forward Layers.

Outline

  1. Replacing Self-Attention Blocks by Feed Forward Layers
  2. Results

1. Replacing Self-Attention Blocks by Feed Forward Layers

The architecture explored in this report is extremely simple, consisting of a patch embedding followed by a series of feed-forward layers.
  • The architecture is identical to that of ViT with the attention layer replaced by a feed-forward layer.

2. Results

comparison of ImageNet top-1 accuracies for different model sizes.
  • Notably, the proposed ViT-base-sized model gives 74.9% top-1 accuracy without any hyperparameter tuning (i.e. using the same hyperparameters as its ViT counterpart).
  • The primary purpose of this report is to explore the limits of simple architectures, not to break the ImageNet benchmarks.

Reference

[2021 arXiv v1] [Do You Even Need Attention?]
Do You Even Need Attention? A Stack of Feed-Forward Layers Does SurprisinglyWell on ImageNet

1.1. Image Classification

19892021 [Do You Even Need Attention?] … 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2] [CSWin Transformer] [Pale Transformer]

My Other Previous Paper Readings

--

--

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store