# Brief Review —WGAN-GP: Improved Training of Wasserstein GANs

**WGAN With Gradient Penalty, Instead of ****WGAN**** With Weight Clipping**

Improved Training of Wasserstein GANsWGAN-GP, by Montreal Institute for Learning Algorithms, Courant Institute of Mathematical Sciences, and CIFAR Fellow2017 NIPS, Over 9400 Citations(Sik-Ho Tsang @ Medium)

Generative Adversarial Network (GAN)Image Synthesis:2014…2019[SAGAN]

==== My Other Paper Readings Are Also Over Here ====

**Wasserstein GAN (WGAN)**makes progress toward stable training of GANs, but sometimes can still generate only**poor samples**or**fail to converge.**It is due to the use of**weight clipping in****WGAN****to enforce a Lipschitz constraint on the critic.**- In this paper,
**WGAN-GP**proposes an alternative**: penalize the norm of gradient of the critic with respect to its input.**

# Outline

**Preliminaries****WGAN With Gradient Penalty (WGAN-GP)****Results**

**1. Preliminaries**

## 1.1. Standard GAN

- The game between the generator
*G*and the discriminator*D*is the**minimax objective:**

It is

unstableduring training.

## 1.2. **Wasserstein GAN (WGAN)**

- WGAN propose instead using the
**Earth-Mover (also called Wasserstein-1) distance**.*W*(*q*,*p*)

- where D is the set of 1-Lipschitz functions.
- Lipschitz function is a function below:

- It is supposed that when x and y is close, f(x) and f(y) is also close.
- Thus, when f is the critic (or discriminator) which is
**enforced to under the Lipschitz constraint, then the training is stable.**

WGANuses weight clipping (clipfunction in line 7) to enforce a Lipschitz constraint on the critic.

**2. WGAN-GP**

Instead of weight clipping, in WGAN-GP,

gradient penalty (GP)is applied as asoft constraint.

is found to work well.*λ*=10**No critic batch normalization**, layer normalization is used as a drop-in replacement for batch normalization.**Two-sided penalty**: WGAN-GP encourages the norm of the gradient to go towards 1 (two-sided penalty) instead of just staying below 1 (one-sided penalty).

# 3. Results

## 3.1. Random Architectures

Starting from the DCGAN architecture,

a set of architecture variantsis defined by changing model settings.

WGAN-GP successfully trains many architectures from this setwhich the standard GAN objective nearly always failed.

Only succeeded in training every architecturewith a shared set of hyperparameters usingWGAN-GP.

## 3.2. CIFAR-10

WGAN-GP converges more slowly(in wall-clock time) than DCGAN, but its score ismore stable at convergence.

Left: An architecture is found to establish anew state of the artInception Scoreonunsupervised CIFAR-10.

Right: The proposed conditional modeloutperforms all others except SGAN.

A deepResNetis successfully trainedon 128 × 128 LSUN bedrooms as above.