Brief Review — Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

PBA, Starts by Easier Augmentation, Followed by Harder Augmentation

Sik-Ho Tsang
3 min readOct 17, 2022
Left: PBA matches AutoAugment’s classification accuracy across a range of different network models on the CIFAR-10 dataset, while requiring 1,000× less GPU hours to run. Right: Comparison of pre-computation costs and test set error (%)

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules, PBA, by UC Berkeley, (Unknown named X), and
2019 ICML, Over 200 Citations (

@ Medium)
Data Augmentation, Image Classification

  • Population Based Augmentation (PBA) is proposed to learn a schedule of augmentation policies.


  1. Population Based Augmentation (PBA)
  2. Results

1. Population Based Augmentation (PBA)

1.1. Learning a Schedule

  • Population Based Training (PBT) is leveraged: A hyperparameter search algorithm which optimizes the parameters of a network jointly with their hyperparameters to maximize performance.
  • The output of PBT is not an optimal hyperparameter configuration but rather a trained model and schedule of hyperparameters.

Similarly, in PBA, we are only interested in the learned schedule and discard the child model result.

  • In PBT, to start, a fixed population of models are randomly initialized and trained in parallel. At certain intervals, an “exploit-and-explore” procedure is applied.

For the worse models, the model clones the weights of a better performing model (i.e., exploitation) and then perturbs the hyperparameters of the cloned model to search in the hyperparameter space (i.e., exploration).

Left: Algorithm 1, Right: Algorithm 2

1.2. Policy Search Space (Algorithm 1)

Augmentations applied to a CIFAR-10 “car” class image
  • A set of hyperparameters consists of two magnitude and probability values for each operation.
  • This gives us 30 operation-magnitude-probability tuples for a total of 60 hyperparameters.
  • Similar to AutoAugment, there are 10 possibilities for magnitude and 11 possibilities for probability.
  • When augmentations are applied to data, all operations are first shuffled and then applied in turn until a limit is reached. This limit can range from 0 to 2 operations, as shown in Algorithm 1 above.
Comparison of AutoAugment and PBA augmentation strategies
  • AutoAugment uses RNN controller to return the hyperparameters to be used, similar to NASNet.

PBA search space includes (10×11)³⁰≈1.75×10⁶¹ possibilities, compared to 2.8×10³² for AutoAugment.

1.3. Training Flow

  • In each iteration we run an epoch of gradient descent.
  • A trial is evaluated on a validation set not used for PBT training and disjoint from the final test set.
  • A trial is ready to go through the exploit-and-explore process once 3 steps/epochs have elapsed.
  • Exploit: Truncation Selection. as in PBT, is used, where a trial in the bottom 25% of the population clones the weights and hyperparameters of a model in the top 25%.
  • Explore: For each hyperparameter, PBA either uniformly resamples from all possible values or perturbs the original value, as shown in Algorithm 2.
  • In experiment, PBA is run with 16 total trials on the Wide-ResNet-40–2 model to generate augmentation schedules.

2. Results

Test set error (%) on CIFAR-10, CIFAR-100, and SVHN
  • Overall, the PBA learned schedule leads AutoAugment slightly on PyramidNet and Wide-ResNet-28–10, and performs comparably on Shake-Shake models, showing that the learned schedule is competitive with state-of-the-art.
Ablation Study
  • Training with the PBA Fixed Policy degrades accuracy by 10% percent on average,

It is hypothesized that schedule improves training by allowing “easy” augmentations in the initial phase of training while still allowing “harder” augmentations to be added later on.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.