# Review — BigBiGAN: Large Scale Adversarial Representation Learning

## Outperforms Many Self-Supervised Learning (SSL) Approaches e.g.: RotNet, Also Matches CPCv2

Large Scale Adversarial Representation LearningBigBiGAN, by DeepMind2019 NeurIPS, Over 480 Citations(Sik-Ho Tsang @ Medium)

Generative Adversarial Network (GAN)Image Synthesis:2014…2019[SAGAN] [BigGAN]2020[GAN Overview]

==== My Other Paper Readings Are Also Over Here ====

**BigBiGAN**is proposed, which is**built upon the state-of-the-art****BigGAN****model**, extending it to representation learning by**adding an encoder**and**modifying the discriminator.**- It achieves the SOTA in
**unsupervised (self-supervised) representation learning on ImageNet**, as well as in**unconditional image generation.** - BigBiGAN is used as comparison for many SSL papers.

# Outline

**BigBiGAN****Results**

**1. BigBiGAN**

## 1.1. Addition of Encoder E

- Given a
**distribution**(e.g., images), and a*Px*of data*x***distribution**(normally*Pz*of latents*z**N*(0, 1)), the generatorsampled from the latent prior*G*models a conditional distribution P(*x*|*z*) of data*x*given latent inputs*z**Pz*, as in the standard GAN generator. - Compared to BigGAN, an
**encoder**.*E*is added

The encoderEmodels the inverse conditional distribution P(z|x), predicting latents z given data x sampled from the data distributionPx.

## 1.2. Joint Discriminator D

- Besides the addition of
*E*, the other modification to the GAN in the BiGAN framework is a**joint discriminator**, which takes as input data-latent pairs (*D**x*,*z*) (rather than just data*x*as in a standard GAN), and**learns to discriminate between pairs from the data distribution and encoder, versus the generator and latent distribution.** - Concretely, its
**inputs**are**pairs (**, and the*x*~*Px,*^*z*~*E*(*x*)) and (^*x*~*G*(*z*),*z*~*Pz*)**goal of the**is to*G*and*E***“fool” the discriminator**by making the two joint distributions*PxE*and*PGz*from which**these pairs are sampled indistinguishable.** **The adversarial minimax objective in****ALI**, analogous to that of the GAN framework, was defined as follows:

## 1.3. Addition of Unary Terms

Additional unary termsare used in the learning objective, which arefunctions only of either the data. These unary terms intuitively guide optimization in the “right direction” by explicitly enforcing this property.xor the latentsz

- Concretely, the
**discriminator loss**and the*LD***encoder-generator loss**are defined as follows, based on*LEG***scalar discriminator “score” functions**and the corresponding*s***per-sample losses**:*l**

- where
*h*(*t*) = max(0; 1-*t*) is a “hinge” used to regularize the discriminator. - The discriminator
*D*includes three submodules:*F*,*H*, and*J*. *F*is a ConvNet and*H*is an MLP.*J*is a function of the outputs of*F*and*H*.

# 2. Results

## 2.1. Ablation Studies

**Latent distribution**: Instead of using*Pz*and stochastic*E*(Var)*z*directly, the**final**, with*z*uses the reparametrized sampling, where*z*=*μ*+*εσ*This non-deterministic Base model achieves*ε*~*N*(0,*I*).**significantly better classification performance.****Unary loss terms (**: The*sx*,*sz*)*x*unary term has a**large positive effect on generation performance**, with the Base and*x*Unary Only rows having significantly better IS and FID than the*z*Unary Only and No Unaries rows.: A*G*capacity**powerful image generator**is indeed**important for learning good representations via the encoder**. Assuming this relationship holds in the future, we expect that better generative models are likely to lead to further improvements in representation learning.**Bidirection**: With an**enhanced**taking*E***higher input resolutions,**generation with BigBiGAN in terms of**FID****is substantially improved over the standard****GAN****.****High resolution**: BigBiGAN achieves*E*with varying resolution*G***better representation learning results as the**, up to the full*G*resolution increases*E*resolution of 256×256. But the overall model is**much slower to train.**The remainder uses the 128×128 resolution for*G*only.:*E*architecture**Improvements**are observed**from RevNet-50, with double-width RevNet**outperforming a ResNet of the same capacity (rows RevNet 2 and ResNet 2). We see**further gains with an even larger quadruple-width RevNet model**(row RevNet 4), which is used for the final results.**Decoupled**: The*E*/*G*optimization*E*optimizer is decoupled from that of*G*, and found that**simply using a 10× higher learning rate for***E***dramatically accelerates training.**

## 2.2. Unsupervised ImageNet

BigBiGAN approach based purely on generative models performs well for representation learning, state-of-the-art among recent unsupervised learning results,improving upon a recently published result fromRotNet (Rotation)of 55.4% to 60.8% top-1 accuracyusing rotation prediction pre-training with the same representation learning architecture

- It also
**matches**the results of the concurrent work in**CPCv2**

## 2.3. **Unconditional Image Generation**

BigBiGAN significantly improves bothISandFIDover the baseline unconditional BigGAN generation.

## 2.4. Image Reconstruction

These reconstructions are** far from pixel-perfect**, likely due in part to the fact that no reconstruction cost is explicitly enforced by the objective — reconstructions are not even computed at training time. However, they may provide some intuition for what features the encoder *E* learns to model.

For example, when the input image contains a dog, person, or a food item, the

reconstruction is often a different instance of the same “category” with similar pose, position, and texture.