Compounding Scaling on Depth, Width, and Resolution, outperforms AmoebaNet, PNASNet, NASNet, SENet, DenseNet, Inception-v4, Inception-v3, Inception-v2, Xception, ResNeXt, PolyNet & ResNet

Image for post
Image for post
Model Size vs. ImageNet Accuracy.

In this story, EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (EfficientNet), by Google Research, Brain Team, is presented. In this paper:

  • Model scaling is systematically studied to carefully balance network depth, width, and resolution that can lead to better performance.
  • An effective compound coefficient is proposed to uniformly scale all dimensions of depth/width/resolution.
  • With neural architecture search (NAS), EfficientNet is obtained.
Image for post
Image for post

This is a paper in 2019 ICML with over 1100 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Compound Scaling
  2. EfficientNet Architecture

1. Single Dimension Scaling

Image for post
Image for post
(a) baseline (b)-(d) Single Dimension Scaling (e) Compound Scaling

1.1. (a) Baseline

Image for post
Image for post
  • where FLii denotes layer Fi is repeated Li times in stage i, (Hi, Wi, Ci) denotes the shape of input tensor X of layer i. …


Weakly Supervised Object Localization (WSOL) Using AlexNet

Image for post
Image for post
Visual Geometry Group, University of Oxford

In this story, Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps (Backprop), by Visual Geometry Group, University of Oxford, is shortly presented. You may already know, this is a paper from the famous VGG research group. It is called Backprop since the latter papers call it Backprop when mentioning it.

Weakly supervised object localization (WSOL) is to find the bounding box of the main object within the image, with only the image-level label, but without the bounding box label.

In this paper:

  • Two visualizing methods are proposed: One is gradient-based method and one is saliency-based method.
  • For saliency-based method, GraphCut is utilized for weakly supervised object localization (WSOL). …


MixConv, Composes MixNets, Similar Performance With MobileNetV3; Outperforms ProxylessNAS, FBNet, DARTS, MnasNet, NASNet, MobileNetV2, ShuffleNet V2, ShuffleNet V1 & MobileNetV1

Image for post
Image for post

In this story, MixConv: Mixed Depthwise Convolutional Kernels (MixConv), by Google Brain, is presented. In this paper:

  • A new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a single convolution.
  • By integrating MixConv into AutoML search space, a new family of models is developed, named as MixNets.

This is a paper in 2019 BMCV with over 60 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. MixConv Performance on MobileNets
  2. Ablation Study
  3. MixNet
  4. MixNet Performance on ImageNet
  5. Transfer Learning Performance

1. MixConv

Image for post
Image for post
(a) Vanilla Depthwise Conv in MobileNetV2, (b) MixConv
  • Unlike vanilla depthwise convolution, MixConv partitions channels into groups and applies different kernel sizes to each group.
Image for post
Image for post
A demo of TensorFlow MixConv
  • More concretely, the input tensor is partitioned into g groups of virtual tensors.


Improves Models for Image Classification, Object Detection & Person Re-identification

Image for post
Image for post
Random Erasing (From Author’s GitHub: https://github.com/zhunzhong07/Random-Erasing) Meow!

In this story, Random Erasing Data Augmentation (Random Erasing, RE), by Xiamen University, University of Technology Sydney, Australian National University, and Carnegie Mellon University, is shortly presented. In this paper:

  • Random Erasing is proposed to randomly select a rectangle region in an image and erases its pixels with random values.
  • This reduces the risk of overfitting and makes the model robust to occlusion.
  • It is is complementary to commonly used data augmentation techniques such as random cropping and flipping.

This is a paper in 2020 AAAI with over 600 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Ablation Study
  2. Experimental…


Outperforms DropBlock, ShakeDrop, Cutout and mixup

Image for post
Image for post
CutMix: Patches are cut and pasted among training image

In this story, CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (CutMix), by NAVER Corp., LINE Plus Corp., and Yonsei University, is shortly presented. In this paper:

  • Patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches.
  • By making efficient use of training pixels and retaining the regularization effect of regional dropout, CutMix consistently outperforms the state-of-the-art augmentation strategies.

This is a paper in 2019 ICCV with over 190 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Comparison with Cutout and mixup
  2. Experimental Results

1. CutMix


Outperforms Shake-Shake & RandomDrop (Stochastic Depth) on ResNeXt, ResNet, Wide ResNet (WRN) & PyramidNet

Image for post
Image for post
ShakeDrop: converge to a better minimum

In this story, ShakeDrop Regularization for Deep Residual Learning (ShakeDrop), by Osaka Prefecture University, and Preferred Networks, Inc., is shortly presented. In this paper:

This is a paper in 2019 IEEE ACCESS with over 40 citations, where ACCESS is an open access journal with high impact factor of 3.745. (Sik-Ho Tsang @ Medium)

Outline

  1. Brief Review of RandomDrop (a.k.a. Stochastic Depth)
  2. ShakeDrop
  3. Experimental Results

1. Brief Review of Shake-Shake

Image for post
Image for post
Shake-Shake
  • The basic ResNeXt building block, which has a three-branch architecture, is given…


A Very Famous Regularization Approach to Prevents Co-Adaptation so as to Reduce Overfitting

In this paper, Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Dropout), by University of Toronto, is shortly presented.

  • The key idea is to randomly drop units (along with their connections) from the neural network during training.
  • This prevents units from co-adapting too much.

This is firstly appeared in 2012 arXiv with over 5000 citations. And it is used in AlexNet (2012 NIPS), with over 73000 citations and got the first place in 2012 ImageNet competition. Finally, it is published in 2014 JMLR with over 23000 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Experimental Results

1. Dropout

1.1. General Idea

Image for post
Image for post
Dropout
  • Left: When using the neural network at the left, if there are some neurons which are quite strong, the network will depend those neurons too much making others weak and unreliable. …


Outperforms Dropout, DropPath from FractalNet, SpatialDropout, Cutout, AutoAugment, and Label Smoothing from Inception-v3

Image for post
Image for post

In this story, DropBlock: A regularization method for convolutional networks (DropBlock), by Google Brain, is shortly presented. In this paper:

  • DropBlock, a form of structured Dropout, is proposed where units in a contiguous region of a feature map are dropped together.
  • Applying DropBlock in skip connections in addition to the convolution layers increases the accuracy.

This is a paper in 2018 NeurIPS with over 200 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Experimental Results

1. DropBlock

Image for post
Image for post
(a) Input Image, (b) Dropout Randomly at Feature Maps, (c) DropBlock at Feature Maps


Outperforms ERM Variants Using Networks DenseNet, ResNeXt, Pre-Activation ResNet, WRN, & ResNet

Image for post
Image for post
mixup (Image from https://blog.airlab.re.kr/2019/11/mixup)

In this story, mixup: Beyond Empirical Risk Minimization, by MIT and FAIR, is shortly presented. In this paper:

  • mixup trains a neural network on convex combinations of pairs of examples and their labels.
  • By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples.

This is a paper in 2018 ICLR with over 1000 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. mixup
  2. Experimental Results

1. Empirical Risk Minimization (ERM)


Outperforms DARTS, MnasNet, PNASNet, NASNet, ShuffleNet V2, MobileNetV2 & CondenseNet

Image for post
Image for post
Differentiable neural architecture search (DNAS) for ConvNet design

In this paper, FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search (FBNet), by UC Berkeley, Princeton University, and Facebook Inc, is presented. In this paper:

  • A differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures.
  • FBNets (Facebook-Berkeley-Nets), a family of models discovered by DNAS outperforms SOTA approaches.

This is a paper in 2019 CVPR with over 300 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Latency-Aware Loss Function
  2. Differentiable Neural Architecture Search (DNAS)
  3. Experimental Results

1. Search Space

1.1. Fixed Macro Architecture and

Image for post
Image for post
Fixed Macro Architecture
  • A fixed macro architecture is defined.
  • The first and the last three layers of the network have fixed operators.
  • For the rest of the layers (TBS, To Be Searched), their block type needs to be searched. …

About

Sik-Ho Tsang

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store