Brief Review — Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

PBA, Starts by Easier Augmentation, Followed by Harder Augmentation

3 min readOct 17, 2022

**Left: PBA matches** **AutoAugment**’s classification accuracy across a range of different network models on the CIFAR-10 dataset, while requiring 1,000× less GPU hours to run. Right: Comparison of pre-computation costs and test set error (%)

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules, PBA, by UC Berkeley, (Unknown named X), and covariant.ai
2019 ICML, Over 200 Citations (Sik-Ho Tsang @ Medium)
Data Augmentation, Image Classification

Population Based Augmentation (PBA) is proposed to learn a schedule of augmentation policies.

Outline

Population Based Augmentation (PBA)
Results

1. Population Based Augmentation (PBA)

1.1. Learning a Schedule

Population Based Training (PBT) is leveraged: A hyperparameter search algorithm which optimizes the parameters of a network jointly with their hyperparameters to maximize performance.
The output of PBT is not an optimal hyperparameter configuration but rather a trained model and schedule of hyperparameters.

Similarly, in PBA, we are only interested in the learned schedule and discard the child model result.

In PBT, to start, a fixed population of models are randomly initialized and trained in parallel. At certain intervals, an “exploit-and-explore” procedure is applied.

For the worse models, the model clones the weights of a better performing model (i.e., exploitation) and then perturbs the hyperparameters of the cloned model to search in the hyperparameter space (i.e., exploration).

**Left: Algorithm 1, Right: Algorithm 2**

1.2. Policy Search Space (Algorithm 1)

**Augmentations applied to a CIFAR-10 “car” class image**

A set of hyperparameters consists of two magnitude and probability values for each operation.
This gives us 30 operation-magnitude-probability tuples for a total of 60 hyperparameters.
Similar to AutoAugment, there are 10 possibilities for magnitude and 11 possibilities for probability.
When augmentations are applied to data, all operations are first shuffled and then applied in turn until a limit is reached. This limit can range from 0 to 2 operations, as shown in Algorithm 1 above.

**Comparison of** **AutoAugment** **and PBA augmentation strategies**

AutoAugment uses RNN controller to return the hyperparameters to be used, similar to NASNet.

PBA search space includes (10×11)³⁰≈1.75×10⁶¹ possibilities, compared to 2.8×10³² for AutoAugment.

1.3. Training Flow

In each iteration we run an epoch of gradient descent.
A trial is evaluated on a validation set not used for PBT training and disjoint from the final test set.
A trial is ready to go through the exploit-and-explore process once 3 steps/epochs have elapsed.
Exploit: Truncation Selection. as in PBT, is used, where a trial in the bottom 25% of the population clones the weights and hyperparameters of a model in the top 25%.
Explore: For each hyperparameter, PBA either uniformly resamples from all possible values or perturbs the original value, as shown in Algorithm 2.
In experiment, PBA is run with 16 total trials on the Wide-ResNet-40–2 model to generate augmentation schedules.

2. Results

**Test set error (%) on CIFAR-10, CIFAR-100, and SVHN**

Overall, the PBA learned schedule leads AutoAugment slightly on PyramidNet and Wide-ResNet-28–10, and performs comparably on Shake-Shake models, showing that the learned schedule is competitive with state-of-the-art.

Training with the PBA Fixed Policy degrades accuracy by 10% percent on average,

It is hypothesized that schedule improves training by allowing “easy” augmentations in the initial phase of training while still allowing “harder” augmentations to be added later on.

Reference

[2019 ICML] [PBA]
Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

1.1. Image Classification

1989 … 2019 [PBA] … 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP]