# Review — EBGAN: Energy-Based Generative Adversarial Network (GAN)

## Using Autoencoder at **Discriminator, Using Repelling Regularizer at Generator**

In this story, **Energy-based Generative Adversarial Network**, (EBGAN), by New York University, and Facebook Artificial Intelligence Research, is briefly reviewed. In this paper:

- EBGAN views the discriminator as an energy function that
**attributes low energies to the regions near the data manifold**and**higher energies to other regions.** - Similar to the probabilistic GANs, the
**generator**is seen as being trained to**produce contrastive samples with minimal energies**, while the**discriminator**is trained to**assign high energies to these generated samples.**

This is a paper in **2017 ICLR **with over **1000 citations**. (Sik-Ho Tsang @ Medium)

# Outline

**Energy-Based Model****Loss Functions****Autoencoder Used At Discriminator****Experimental Results**

# 1. Energy-Based Model

**Supervised learning**falls into this framework: for each*X*in the training set,**the energy of the pair (**takes*X*,*Y*)**low**values**when***Y*is the correct**label**and**higher**values for**incorrect***Y*’s.- Similarly, when
**modeling**within an*X*alone**unsupervised learning**setting,**lower energy is attributed to the data manifold.** - The term
**contrastive sample**is often used to refer to a data point causing an energy pull-up, such as the incorrect*Y*’s in supervised learning and points from low data density regions in unsupervised learning. - (Btw, contrastive learning is crucial in self-supervised learning.)

# 2. Loss Functions

**The output of the discriminator**goes through an objective functional in order to**shape the energy function**, a**ttributing low energy to the real data samples**and**higher energy to the generated (“fake”) ones.**- Two different losses, one to train
*D*and the other to train*G*. - In order to get better quality gradients when the generator is far from convergence.
**Given a positive margin***m*, a data sample*x*and a generated sample*G*(*z*), the discriminator loss*LD*and the generator loss*LG*are formally defined by:

- where [
*a*]+ = max(0,*a*). - Minimizing
*LG*with respect to the parameters of*G*is similar to maximizing the second term of*LD*. - It has the same minimum but non-zero gradients when
*D*(*G*(*z*)) ≥*m*. **When**, where*D*(*G*(*z*)) ≥*m*, there is a larger*LD*loss*m*is a hyperparameter.- If the system reaches a Nash equilibrium, then the generator
*G*produces samples that are indistinguishable from the distribution of the dataset. - (There is mathematical proof for the optimality of the solution. Please feel free to read the paper.)

**3. Autoencoder Used At Discriminator**

## 3.1. Reasons of Using Autoencoder

- In EBGAN, the discriminator
*D*is structured as an auto-encoder. - Rather than using a single bit of target information to train the model, the reconstruction-based output offers a diverse targets for the discriminator.
- With the binary logistic loss, only two targets are possible, so within a minibatch.
- On the other hand,
**the reconstruction loss will likely produce very different gradient directions within the minibatch.** - When trained with some regularization terms,
**auto-encoders have the ability to learn an energy manifold without supervision or negative examples.** - Even when an EBGAN auto-encoding model is trained to reconstruct a real sample, the discriminator contributes to discovering the data manifold by itself.

## 3.2. **EBGAN-PT: **Repelling Regularizer

**One common issue**in training auto-encoders is that the model may learn little more than an**identity function**, meaning that it**attributes zero energy to the whole space.**- The model must be pushed to give higher energy to points outside the data manifold.
- The proposed repelling regularizer is to
**purposely keep the model from producing samples that are clustered in one or only few modes of**It involves a*pdata*.**Pulling-away Term (PT)**that runs at a representation level. - Formally, let
*S*denotes a batch of sample representations taken from the encoder output layer. Cosine distance/similarity is used:

- PT operates on a mini-batch and
**attempts to orthogonalize the pairwise sample representation.** - This is denoted as “
**EBGAN-PT**”. Note the**PT is used in the generator loss**but not in the discriminator loss.

# 4. Experimental Results

## 4.1. MNIST Generation

- Some hyperparameters are found by Grid Search for find the best GAN and EBGAN.

- The above generated MNIST are with the best inception score. GAN one has some noise or unrecognized digits generated.
- EBGAN and EBGAN-PT have a better results.

## 4.2. Semi-Supervised Learning on PI-MNIST

- The potential of using the EBGAN framework for semi-supervised learning is shown on permutation-invariant MNIST (PI-MINST), collectively on using 100, 200 and 1000 labels.
- It is found to be crucial in enabling EBGAN framework for semi-supervised learning is to
**gradually decay the margin value**. The rationale behind is to*m*of the first equation**let discriminator punish generator less when***pG*gets closer to the data manifold. - This margin decaying schedule is found by hyperparameter search.
**The contrastive samples**can be thought as an**extension to the dataset**that**provides more information to the classifier.**- Using Ladder Network (LD) as baseline, with EBGAN, large amount of improvement is achieved.

## 4.3 LSUN & CELEBA

- EBGAN framework is used with deep convolutional architecture to generate 64×64 RGB images.

## 4.4. ImageNet

- Compared with the datasets we have experimented so far,
**ImageNet**presents an**extensively larger and wilder space**, so modeling the data distribution by a generative model becomes**very challenging**. - Despite the difficulty of generating images on a high-resolution level, EBGANs are able to
**learn about the fact that objects appear in the foreground**, together with various background components resembling grass texture, sea under the horizon, mirrored mountain in the water, buildings, etc. - In addition, the
**256×256 dog-breed generations**, although**far from realistic**, do reflect some knowledge about the appearances of dogs such as their body, furs and eye.

## Reference

[2017 ICLR] [EBGAN]

Energy-based Generative Adversarial Network

Some Figures: https://www.slideshare.net/MingukKang/ebgan

## Generative Adversarial Network (GAN)

**Image Synthesis** [GAN] [CGAN] [LAPGAN] [AAE] [DCGAN] [CoGAN] [SimGAN] [BiGAN] [ALI] [LSGAN] [EBGAN]**Image-to-image Translation **[Pix2Pix] [UNIT] [CycleGAN] [MUNIT]**Super Resolution** [SRGAN & SRResNet] [EnhanceNet] [ESRGAN]**Blur Detection** [DMENet]**Camera Tampering Detection **[Mantini’s VISAPP’19]**Video Coding** [VC-LAPGAN] [Zhu TMM’20] [Zhong ELECGJ’21]