Review — BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis

BigGAN & BigGAN-deep, Generates High-Resolution Images

5 min readAug 22, 2023

**Class-conditional samples generated by BigGAN**

Large Scale GAN Training for High Fidelity Natural Image Synthesis,
BigGAN, BigGAN-deep, by Heriot-Watt University, and DeepMind,
2019 ICLR, Over 4500 Citations (Sik-Ho Tsang @ Medium)
Generative Adversarial Network (GAN)
Image Synthesis: 2014 … 2019 [SAGAN] 2020 [GAN Overview]
==== My Other Paper Readings Are Also Over Here ====

SAGAN is proposed by scaling up GAN, with also enhanced architecture: shared class embeddings and noise Vector Skip connection.
Orthogonal regularization is applied to improve the generator performance. A simple truncation trick is proposed to allow fine control over the trade-off between sample fidelity and variety.
128×128, 256×256, 512×512 high resolution (at that moment) ImageNet images can be generated.

Outline

BigGAN: Scaling Up GAN & Enhanced Architecture
BigGAN: Orthogonal Regularization & Truncation Trick
Results

1. BigGAN: Scaling Up GAN & Enhanced Architecture

**Scaling Up** **SAGAN** **With Also Different Techniques**

1.1. Baseline

SAGAN with hinge loss is used as baseline. Class information is provided to G with class-conditional Batch Norm and provided to D with projection.
The optimization settings follow SAGAN (notably employing Spectral Norm in G) with the modification that we halve the learning rates and take two D steps per G step. Moving averages of G’s weights are used during evaluation.
Progressive growing in Progressive GAN is found to be NOT necessary.

1.2. Scaling Up

Rows 1–4 of Table 1: Simply increasing the batch size by a factor of 8 improves the state-of-the-art Inception Score (IS) by 46%. This is a result of each batch covering more modes, providing better gradients for both networks.

But it may become unstable and undergo complete training collapse. Scores are obtained from checkpoints saved just before collapse.

Row 5: Then the width (number of channels) in each layer is increased by 50%. This leads to a further IS improvement of 21% due to the increased capacity of the model relative to the complexity of the dataset.

1.3. Shared Class Embeddings: Rows 6–9 (Shared)

Row 6 (Shared): Class embeddings c is used for the conditional Batch Norm layers in G contain a large number of weights.
Instead of having a separate layer for each embedding, a shared embedding is used, which is linearly projected to each layer’s gains and biases.

This reduces computation and memory costs, and improves training speed (in number of iterations required to reach a given performance) by 37%.

1.4. Noise Vector Skip Connection: Rows 7–9 (Skip-z)

Row 7 (Skip-z): Add direct skip connections from the noise vector z to multiple layers of G rather than just the initial layer. The intuition behind this design is to allow G to use the latent space to directly influence features at different resolutions and levels of hierarchy.

In BigGAN, this is accomplished by splitting z into one chunk per resolution, and concatenating each chunk to the conditional vector c which gets projected to the Batch Norm gains and biases.
In BigGAN-deep, an even simpler design is used, concatenating the entire z with the conditional vector without splitting it into chunks.

Skip-z provides a modest performance improvement of around 4%, and improves training speed by a further 18%.

1.4. BigGAN Model Architecture

**BigGAN Model Architecture (Paper Appendix)**

1.5. BigGAN-deep Model Architecture

**BigGAN-deep Model Architecture (Paper Appendix)**

2. BigGAN: Truncation Trick & Orthogonal Regularization

2.1. Truncation Trick

GANs can employ an arbitrary prior p(z), yet the vast majority of previous works have chosen to draw z from either N(0, I) or U[-1, 1].

Taking a model trained with z ~ N(0, I) and sampling z from a truncated normal (where values which fall outside a range are resampled to fall inside that range) immediately provides a boost to IS and FID.
This is called the Truncation Trick: truncating a z vector by resampling the values with magnitude above a chosen threshold leads to improvement in individual sample quality at the cost of reduction in overall sample variety.

Figure 2(a): As in the figure above, reducing the truncation threshold leads to a direct increase in IS (analogous to precision). FID penalizes lack of variety (analogous to recall) but also rewards precision.
Figure 2(b): Some of the larger models are not amenable to truncation, producing saturation artifacts.

2.2. Orthogonal Regularization (Rows 8 & 9)

Orthogonal Regularization (Brock et al., 2017), which directly enforces the orthogonality condition:

This regularization is known to often be too limiting (Miyato et al., 2018). It is modified to relax the constraint while still imparting the desired smoothness.

The diagonal terms are removed from the regularization, and it is aimed to minimize the pairwise cosine similarity between filters but does not constrain their norm: