Brief Review — Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

PBA, Starts by Easier Augmentation, Followed by Harder Augmentation

Sik-Ho Tsang
3 min readOct 17, 2022
Left: PBA matches AutoAugment’s classification accuracy across a range of different network models on the CIFAR-10 dataset, while requiring 1,000× less GPU hours to run. Right: Comparison of pre-computation costs and test set error (%)

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules, PBA, by UC Berkeley, (Unknown named X), and covariant.ai
2019 ICML, Over 200 Citations (Sik-Ho Tsang @ Medium)
Data Augmentation, Image Classification

  • Population Based Augmentation (PBA) is proposed to learn a schedule of augmentation policies.

Outline

  1. Population Based Augmentation (PBA)
  2. Results

1. Population Based Augmentation (PBA)

1.1. Learning a Schedule

  • Population Based Training (PBT) is leveraged: A hyperparameter search algorithm which optimizes the parameters of a network jointly with their hyperparameters to maximize performance.
  • The output of PBT is not an optimal hyperparameter configuration but rather a trained model and schedule of hyperparameters.

Similarly, in PBA, we are only interested in the learned schedule and discard the child model result.

  • In PBT, to start, a fixed population of models are randomly initialized and trained in parallel. At certain intervals, an “exploit-and-explore” procedure is applied.

For the worse models, the model clones the weights of a better performing model (i.e., exploitation) and then perturbs the hyperparameters of the cloned model to search in the hyperparameter space (i.e., exploration).

Left: Algorithm 1, Right: Algorithm 2

1.2. Policy Search Space (Algorithm 1)

Augmentations applied to a CIFAR-10 “car” class image
  • A set of hyperparameters consists of two magnitude and probability values for each operation.
  • This gives us 30 operation-magnitude-probability tuples for a total of 60 hyperparameters.
  • Similar to AutoAugment, there are 10 possibilities for magnitude and 11 possibilities for probability.
  • When augmentations are applied to data, all operations are first shuffled and then applied in turn until a limit is reached. This limit can range from 0 to 2 operations, as shown in Algorithm 1 above.
Comparison of AutoAugment and PBA augmentation strategies
  • AutoAugment uses RNN controller to return the hyperparameters to be used, similar to NASNet.

PBA search space includes (10×11)³⁰≈1.75×10⁶¹ possibilities, compared to 2.8×10³² for AutoAugment.

1.3. Training Flow

  • In each iteration we run an epoch of gradient descent.
  • A trial is evaluated on a validation set not used for PBT training and disjoint from the final test set.
  • A trial is ready to go through the exploit-and-explore process once 3 steps/epochs have elapsed.
  • Exploit: Truncation Selection. as in PBT, is used, where a trial in the bottom 25% of the population clones the weights and hyperparameters of a model in the top 25%.
  • Explore: For each hyperparameter, PBA either uniformly resamples from all possible values or perturbs the original value, as shown in Algorithm 2.
  • In experiment, PBA is run with 16 total trials on the Wide-ResNet-40–2 model to generate augmentation schedules.

2. Results

Test set error (%) on CIFAR-10, CIFAR-100, and SVHN
  • Overall, the PBA learned schedule leads AutoAugment slightly on PyramidNet and Wide-ResNet-28–10, and performs comparably on Shake-Shake models, showing that the learned schedule is competitive with state-of-the-art.
Ablation Study
  • Training with the PBA Fixed Policy degrades accuracy by 10% percent on average,

It is hypothesized that schedule improves training by allowing “easy” augmentations in the initial phase of training while still allowing “harder” augmentations to be added later on.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.