Brief Review — WGAN: Wasserstein Generative Adversarial Networks

WGAN, Improve Stability

Sik-Ho Tsang
3 min readJun 27, 2023

--

WGAN algorithm on LSUN indoor dataset: generator and critic are s.


WGAN, by Courant Institute of Mathematical Sciences, and Facebook AI Research (FAIR)
2017 ICML, Over 13000 Citations (Sik-Ho Tsang @ Medium)

Generative Adversarial Network (GAN)
Image Synthesis: 20142019 []

  • training is unstable and may have mode collapse. With WGAN, training becomes stable.

Outline

  1. WGAN
  2. Results

1. WGAN

  • (Since there is a lot of blogs explaining WGAN in very details. Here, I would rather provide a very short article.)

1.1. Problem of Standard

  • In standard , two-player min max game is used to train the discriminator and generator. Log function is used.
  • The training is not stable, and also may cause mode collapse.

1.2. WGAN

WGAN Algorithm
  • Sigmoid is removed, The discriminator becomes critic.
  • The critic and generator do not use log function any more and the loss is formulate as below:

The original discriminator learns very quickly to distinguish between fake and real, and as expected provides no reliable gradient information.

The critic, however, can’t saturate, and converges to a linear function that gives remarkably clean gradients everywhere.

  • After that, weight clipping clip(.) is used.
  • When using RMSProp is fine for gradient descent, yet it is found that using Adam causes the loss blew up and samples got worse.

WGAN is summarized as below:

  • (Lines 3–4): Each time, sample a batch of real samples, as well as a batch of fake samples from distribution z.
  • (Lines 5): f is the critic model and g is the generator. The output layer of the critic model f is the linear activation function instead of sigmoid.
  • Take the average prediction for real samples x from the critic model f, minus the average prediction for fake samples g(z) from the critic model f.
  • (Lines 6–7): RMSProp, one of the standard weight update algorithms, is used to updated w. This w is clipped in range of [-c, c].
  • (Lines 9–11): A batch of fake samples from distribution z is sampled. This time, the average prediction for fake samples g(z) from the critic model f to update the generator through RMSProp.

2. Results

Training Curves of WGAN
  • This is the first time in literature that such a property is shown, where the loss of the GAN shows properties of convergence.

The lower the loss, the better sample quality.

Training Curves of JS

However, if using JS, the loss cannot reflect the quality proportionally.

  • The above figures show the samples generated for these three architectures using both the WGAN and algorithms.
  • Particularly in Figure 6, Standard fails when the model does not use batch norm and constant number of filters at every layers while WGAN still can work well.

Authors also mentioned that they cannot see evidence of mode collapse for the WGAN algorithm during all experiments.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.

No responses yet

Write a response