# Review: RandAugment

## A Set of Data Augmentation Choices is Randomized

--

RandAugment: Practical automated data augmentation with a reduced search space, RandAugment, by Google Research, Brain Team2020 NeurIPS, Over 600 Citations(Sik-Ho Tsang @ Medium)

Image Classification, Data Augmentation

- For NAS-based data augmentation approaches, such as AutoAugment (AA), large search space is needed to find a set of augmentation techniques. This separate search phase significantly complicates training and is computationally expensive.
**In RandAugment, a simple but effective way is proposed to randomly select a set of augmentation techniques.**

# Outline

**RandAugment****Experimental Results**

**1. RandAugment**

## 1.1. N: Number of Transformation

- The primary goal of RandAugment is to remove the need for a separate search phase on a proxy task.
- A
**parameter-free**procedure is proposed where RandAugment always**selects a transformation with uniform probability 1/***K*. - Given
for a training image,*N*transformations**RandAugment may thus express***K*^*N*potential policies. - With
*N*=14, the number of transformations is: identity,*autoContrast, equalize, rotate, solarize, color, posterize, contrast, brightness, sharpness, shear-x, shear-y, translate-x, translate-y.*

## 1.2. M: Magnitude of Transformation

- The final set of parameters to consider is the
**magnitude**of the each augmentation distortion. - Briefly, each transformation resides on an
**integer scale from 0 to 10**where a value of**10**indicates the**maximum scale**for a given transformation.

**2. Experimental Results**

## 2.1. RandAugment Studies

- The Wide-ResNet (WRN) models are trained with the additional
*K*=14 data augmentations, over*N*=1**a range of global distortion magnitudes**parameterized on a uniform linear scale ranging from*M***[0, 30].**

(a) & (b): Larger networks demand larger data distortions for regularization.

- Dashed lines in (b) & (d): AutoAugment magnitude which is constant.

(c) & (d): Optimal distortion magnitude is larger for models that are trained on larger datasets.

- Two free parameters
*N*and*M*specifying RandAugment are identified through a minimal grid search.

## 2.2. CIFAR & SVHN

**CIFAR-10**: The default augmentations for all methods include flips, pad-and-crop and Cutout.**1 setting for**and*N***5 settings for**(*M**N*=3 and tried 4, 5, 7, 9, and 11 for magnitude) are found using held-out val set.

RandAugment achieves either

competitive (i.e. within 0.1%)or state-of- the-art on CIFAR-10 across four network architectures.

**CIFAR-100**: Similar for CIFAR-100, 2 and 4 settings for*N*and*M*, are sampled respectively. (i.e.*N*={1, 2} and*M*={2, 6, 10, 14}). For WRN, Wide-ResNet-28–2 and Wide-ResNet-28–10,*N*=1,*M*=2 and*N*=2,*M*=14 achieves best results, respectively.

Again, RandAugment achieves competitive or superior results.

**SVHN**:*N*=3 and tried 5, 7, 9, and 11 for magnitude.

WRN Wide-ResNet-28–10 with RandAugment matches the previous state-of-the-art accuracy on SVHN which used a more advanced model.

## 2.3. ImageNet

- RandAugment
**matches the performance**of AutoAugment and Fast AutoAugment on the**smallest model (ResNet-50)**.

On larger models RandAugment significantly outperforms other methods achieving increases of

up to +1.3% above the baseline.

## 2.4. COCO

- AutoAugment expended ~15K GPU hours for search, where as RandAugment was tuned by on merely 6 values of the hyperparameters.
*N*=1 and tried distortion magnitudes between 4 and 9.

RandAugment surpasses the baseline model and provides competitive accuracy with AutoAugment.

## 2.5. Investigating Transformations

Surprisingly,

and lower variation even when included in small subsets of RandAugment transformations, whilerotatecan significantly improve performance.posterizeseems to hurt all subsets of all sizes

## 2.6. Learning the Probabilities for Selecting Image Transformations

is denoted as the*αij***learned probability of selecting image transformation**For*i*for operation*j*.*K*=14 image transformations and*N*=2 operations,*αij*constitutes 28 parameters.- All weights are initialized such that each transformation is equal probability (i.e. RandAugment), and these parameters are
**updated based on how well a model classifies****a held out set of validation images distorted by***αij*.

Learning the probabilities may improve the performance on small-scale tasksand explorations are reserved to larger-scale tasks for the future.

## Reference

[2020 NeurIPS] [RandAugment]

RandAugment: Practical automated data augmentation with a reduced search space

## Image Classification

**1989–2018 … 2019**: [ResNet-38] [AmoebaNet] [ESPNetv2] [MnasNet] [Single-Path NAS] [DARTS] [ProxylessNAS] [MobileNetV3] [FBNet] [ShakeDrop] [CutMix] [MixConv] [EfficientNet] [ABN] [SKNet] [CB Loss] [AutoAugment, AA] [BagNet] [Stylized-ImageNet] [FixRes] [Ramachandran’s NeurIPS’19] [SE-WRN] [SGELU] [ImageNet-V2]**2020**: [Random Erasing (RE)] [SAOL] [AdderNet] [FixEfficientNet] [BiT] [RandAugment]**2021**: [Learned Resizer] [Vision Transformer, ViT]