# Review — ALI: Adversarially Learned Inference (GAN)

## Not Only Mapping from Latent Space to Data Space, But Also Mapping from Data Space to Latent Space, Outperforms DCGAN

--

In* *this story, **Adversarially Learned Inference**, (ALI), by Université de Montréal, Stanford, New York University, and CIFAR Fellow, is briefly reviewed. In this story:

**The generation network**maps samples from stochastic latent variables to the data space.**The inference network**maps training examples in data space to the space of latent variables.**The discriminative network**is trained to distinguish between joint latent/data-space samples from the generative network and joint samples from the inference network.

This is a paper in **2017 ICLR **with over **1000** **citations**. (Sik-Ho Tsang @ Medium)

The idea is the same as BiGAN, but they are proposed independently and published in the same conference (2017 ICLR). Some papers would cite both ALI and BiGAN together when talking about this idea.

# Outline

**ALI: Overall Structure****Experimental Results**

**1. ALI: Overall Structure**

- Similar to BiGAN, in order to match the joint distributions, an adversarial game is played, as shown above.
*Gz***encoder**.*Gx***decoder**.

Joint pairs (

x,i) are drawn either fromq(x,z) orp(x,z), and a discriminator network learns to discriminate between the two, while the encoder and decoder networks are trained to fool the discriminator.

- If we treat
and*Gz*in ALI as*Gx***encoder**and*E***decoder(generator)**respectively, it is a bidirectional GAN (BiGAN).*G*

Unlike the GAN where the discriminator sees only

xas input, in the BiGAN/ALI,Dsees bothxandz, i.e., the observation and its latent representation together.For a true sample,

xis given (it is taken from the training set) and the correspondingzis generated by the encoderE.For a fake sample,

zis given (it is sampled fromp(z)) and its correspondingxis generated by the generatorG.

- The encoder
*E*is also implemented as a deep neural network and (as in the AE) its architecture is usually taken as the inverse of*G*. - It is trained just like the generator, namely by back-propagating from the loss function defined at the output of the discriminator.
- ⊕ is the vector concatenation operation to concatenate the flatten
*x*and latent vector*z*before input into the discriminator.

Once training is complete, just like we can use the generator to predict

xfor newz, we can use the encoder to predictzfor anyx.

**2. Experimental Results**

## 2.1. Samples and Reconstruction

- Below shows the (a) original samples, and (b) the reconstructed samples by ALI, for different datasets.
- For the reconstructions in (b), odd columns are original samples from the validation set and even columns are corresponding reconstructions.

- We observe that reconstructions are not always faithful reproductions of the inputs. They
**retain the same crispness and quality characteristic**to adversarially-trained models, but oftentimes make mistakes in capturing exact object placement, color, style and (in extreme cases) object identity. - Note that the
**ALI training objective does not involve an explicit reconstruction loss.**

## 2.2. Latent Space Interpolations

- By linearly interpolating between
*z*1 and*z*2 and passing the intermediary points through the decoder, the above plot is generated at the input-space interpolations. **Smooth transitions**are observed between pairs of examples, and**intermediary images remain believable.**

## 2.3. Semi-Supervised Learning

- Using ALI’s inference network to extract features, then using SVM to predict the class using ALI’s extracted features, a misclassification rate is achieved that is roughly 3.00% lower than DCGAN.

- ALI’s performance is investigated as well when label information is taken into account during training.
- The discriminator takes
*x*and*z*as input and outputs a distribution over*K*+1 classes, where*K*is the number of categories. - The above table shows that ALI offers a modest improvement over Salimans et al. (2016), more specifically for 1000 and 2000 labeled examples.

It is conjectured that the latent representation learned by ALI is better untangled with respect to the classification task and that it generalizes better.

## 2.4. Conditional Generation

- ALI is extended to match a conditional distribution where
represent a fully observed conditioning variable, e.g.*y***attributes in CelebA dataset.**

- We can treat this as ALI+CGAN.
**(I) to (IV): A single fixed latent code***z*is sampled.**(a) to (l): Attributes are then varied**uniformly over rows across all columns in the following sequence: (b) black hair; (c) brown hair; (d) blond hair; (e) black hair, wavy hair; (f) blond hair, bangs; (g) blond hair, receding hairline; (h) blond hair, balding; (i) black hair, smiling; (j) black hair, smiling, mouth slightly open; (k) black hair, smiling, mouth slightly open, eyeglasses; (l) black hair, smiling, mouth slightly open, eyeglasses, wearing hat.- (Since ALI is similar to BiGAN, I don’t describe much. If interested, please feel free to read the paper for more details.)

## Reference

[2017 ICLR] [ALI]

Adversarially Learned Inference

## Generative Adversarial Network (GAN)

**Image Synthesis **[GAN] [CGAN] [LAPGAN] [AAE] [DCGAN] [CoGAN] [SimGAN] [BiGAN] [ALI]**Image-to-image Translation **[Pix2Pix] [UNIT]**Super Resolution** [SRGAN & SRResNet] [EnhanceNet] [ESRGAN]**Blur Detection** [DMENet]**Camera Tampering Detection **[Mantini’s VISAPP’19]**Video Coding** [VC-LAPGAN] [Zhu TMM’20] [Zhong ELECGJ’21]