Review: RandAugment

A Set of Data Augmentation Choices is Randomized

Sik-Ho Tsang
5 min readFeb 9, 2022
RandAugment (Image from here)

RandAugment: Practical automated data augmentation with a reduced search space, RandAugment, by Google Research, Brain Team
2020 NeurIPS, Over 600 Citations (Sik-Ho Tsang @ Medium)
Image Classification, Data Augmentation

  • For NAS-based data augmentation approaches, such as AutoAugment (AA), large search space is needed to find a set of augmentation techniques. This separate search phase significantly complicates training and is computationally expensive.
  • In RandAugment, a simple but effective way is proposed to randomly select a set of augmentation techniques.


  1. RandAugment
  2. Experimental Results

1. RandAugment

1.1. N: Number of Transformation

Python Code for RandAugment
  • The primary goal of RandAugment is to remove the need for a separate search phase on a proxy task.
  • A parameter-free procedure is proposed where RandAugment always selects a transformation with uniform probability 1/K.
  • Given N transformations for a training image, RandAugment may thus express K^N potential policies.
  • With N=14, the number of transformations is: identity, autoContrast, equalize, rotate, solarize, color, posterize, contrast, brightness, sharpness, shear-x, shear-y, translate-x, translate-y.

1.2. M: Magnitude of Transformation

Example images augmented by RandAugment. In these examples N=2 and 3 magnitudes
  • The final set of parameters to consider is the magnitude of the each augmentation distortion.
  • Briefly, each transformation resides on an integer scale from 0 to 10 where a value of 10 indicates the maximum scale for a given transformation.

2. Experimental Results

2.1. RandAugment Studies

Optimal magnitude of augmentation depends on the size of the model and the training set
  • The Wide-ResNet (WRN) models are trained with the additional K=14 data augmentations and N=1, over a range of global distortion magnitudes M parameterized on a uniform linear scale ranging from [0, 30].

(a) & (b): Larger networks demand larger data distortions for regularization.

  • Dashed lines in (b) & (d): AutoAugment magnitude which is constant.

(c) & (d): Optimal distortion magnitude is larger for models that are trained on larger datasets.

  • Two free parameters N and M specifying RandAugment are identified through a minimal grid search.


Test accuracy (%) on CIFAR-10, CIFAR-100, SVHN and SVHN core set
  • CIFAR-10: The default augmentations for all methods include flips, pad-and-crop and Cutout. 1 setting for N and 5 settings for M (N=3 and tried 4, 5, 7, 9, and 11 for magnitude) are found using held-out val set.

RandAugment achieves either competitive (i.e. within 0.1%) or state-of- the-art on CIFAR-10 across four network architectures.

  • CIFAR-100: Similar for CIFAR-100, 2 and 4 settings for N and M, are sampled respectively. (i.e. N={1, 2} and M={2, 6, 10, 14}). For WRN, Wide-ResNet-28–2 and Wide-ResNet-28–10, N=1, M=2 and N=2, M=14 achieves best results, respectively.

Again, RandAugment achieves competitive or superior results.

  • SVHN: N=3 and tried 5, 7, 9, and 11 for magnitude.

WRN Wide-ResNet-28–10 with RandAugment matches the previous state-of-the-art accuracy on SVHN which used a more advanced model.

2.3. ImageNet

Top-1 and Top-5 accuracies (%) on ImageNet
  • RandAugment matches the performance of AutoAugment and Fast AutoAugment on the smallest model (ResNet-50).

On larger models RandAugment significantly outperforms other methods achieving increases of up to +1.3% above the baseline.

2.4. COCO

Mean average precision (mAP) on COCO detection task
  • AutoAugment expended ~15K GPU hours for search, where as RandAugment was tuned by on merely 6 values of the hyperparameters.
  • N=1 and tried distortion magnitudes between 4 and 9.

RandAugment surpasses the baseline model and provides competitive accuracy with AutoAugment.

2.5. Investigating Transformations

Average performance improves when more transformations are included in RandAugment
Average improvement due to each transformation on CIFAR-10 (N=3, M=4)

Surprisingly, rotate can significantly improve performance and lower variation even when included in small subsets of RandAugment transformations, while posterize seems to hurt all subsets of all sizes.

2.6. Learning the Probabilities for Selecting Image Transformations

Differentiable optimization for augmentation can improve RandAugment
  • αij is denoted as the learned probability of selecting image transformation i for operation j. For K=14 image transformations and N=2 operations, αij constitutes 28 parameters.
  • All weights are initialized such that each transformation is equal probability (i.e. RandAugment), and these parameters are updated based on how well a model classifies a held out set of validation images distorted by αij.

Learning the probabilities may improve the performance on small-scale tasks and explorations are reserved to larger-scale tasks for the future.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.