Review — Fixing the train-test resolution discrepancy: FixEfficientNet

Apply FixRes onto EfficientNet for Additional Results

FixEfficientNet (orange curve) surpasses all EfficientNet models, including the models trained with Noisy student (red curve) and adversarial examples (blue curve). The sws models are from [2].

Fixing the train-test resolution discrepancy: FixEfficientNet
FixEfficientNet, by Facebook AI Research
2020 arXiv v5, Over 200 Citations. (Sik-Ho Tsang @ Medium)

Outline

  1. FixEfficientNet
  2. Experimental Results

1. FixEfficientNet

  • FixRes is a simple but efficient fine-tuning strategy.
  • First, EfficientNet is trained using a smaller input image size (train res).
  • Then, EfficientNet is re-trained or a few top layers at the target resolution (test res).
  • The only difference is that FixRes data augmentation is combined with label smoothing (in Inception-v3) during the fine-tuning.
  • (Please feel free to read FixRes for more details if interested.)

2. Experimental Results

2.1. ImageNet

Results on ImageNet without external data (single Crop evaluation)
Results on ImageNet with extra training data

FixEfficientNet-L2 surpasses all other results reported in the literature.

  • It achieves 88.5% Top-1 accuracy and 98.7% Top-5 accuracy on the ImageNet-2012 validation benchmark.

2.2. ImageNet-Real

Results on ImageNet Real labels
  • There are some incorrect labels in ImageNet, ImageNet clean labels are labels cleaned by Beyer et all. [5].

With 90.9% Top-1 accuracy and 98.8% Top-5 accuracy, FixEfficientNet-L2 surpasses all other results reported in the literature with this labels.

2.3. ImageNet-V2

Results on ImageNet-V2 [17] Matched Frequency with extra-training data.
Results on ImageNet-V2 [17] Matched Frequency without external data (single Crop evaluation).
  • ImageNet-V2 [17] dataset was introduced to overcome the lack of a test split in the Imagenet dataset. ImageNet-V2 consists of 3 novel test sets that replace the ImageNet test set, which is no longer available.
Performance comparison and state of the art on ImageNet-v2, single crop with external data, sorted by top-1 accuracy. NS: Noisy Student [8]. BS: Billion-scale [2].

FixEfficientNet-L2 that fine-tuned from EfficientNet establishes the new state of the art with additional data on this benchmark.

Hope I can review Noisy Student [8], and Billion-scale [2] in the coming future.

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Half-Toning: Ordered Dithering

Exploring Word Embeddings and Text Catalogs with Apple’s Natural Language Framework in iOS

Autoencoder For Anomaly Detection Using Tensorflow Keras

The Data-Product-Scientist-Manager

Does BERT Need Clean Data? Part 1: Data Cleaning.

TensorFlow 1.0 vs 2.0, Part 3: tf.keras

Understanding Embedding Layer in Keras

Parts of speech tagging

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sik-Ho Tsang

Sik-Ho Tsang

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

More from Medium

Review — FixRes: Fixing the Train-Test Resolution Discrepancy

Review — Vision Transformer with Deformable Attention

Ch 9. Vision Transformer Part I— Introduction and Fine-Tuning in PyTorch

Swin/Vision Transformers — Hacking the human eye