Brief Review — AC-GAN: Conditional Image Synthesis With Auxiliary Classifier GANs

AC-GAN, Conditioned on Both Noise Vector and Class Label

Sik-Ho Tsang
3 min readAug 13, 2023
ImageNet Samples Generated by AC-GAN

Conditional Image Synthesis With Auxiliary Classifier GANs
AC-GAN, by Google Brain
2017 ICML, Over 3400 Citations (Sik-Ho Tsang @ Medium)

Generative Adversarial Network (GAN)
Image Synthesis: 20142019 [SAGAN]
==== My Other Paper Readings Are Also Over Here ====

  • Auxiliary Classifier GAN (AC-GAN), is proposed, which employs label conditioning that results in 128×128 resolution image samples exhibiting global coherence.

Outline

  1. Auxiliary Classifier GAN (AC-GAN)
  2. Results

1. Auxiliary Classifier GAN (AC-GAN)

1.1. Loss Functions

In AC-GAN, every generated sample has a corresponding class label, c ~ pc in addition to the noise z. The generator G uses both to generate images Xfake = G(c, z).

  • The discriminator gives both a probability distribution over sources and a probability distribution over the class labels, P(S | X), P(C | X) = D(X).
  • The objective function has two parts: the loglikelihood of the correct source, LS, and the log-likelihood of the correct class, LC.

D is trained to maximize LS + LC while G is trained to maximize LC - LS. AC-GANs learn a representation for z that is independent of class label.

  • Structurally, this model is not tremendously different from existing models. However, this modification to the standard GAN formulation produces excellent results and appears to stabilize training.

1.2. Model

  • The structure of the AC-GAN model permits separating large datasets into subsets by class and training a generator and discriminator for each subset. All ImageNet experiments are conducted using an ensemble of 100 AC-GANs, each trained on a 10-class split, 50000 mini-batches of size 100.
  • Broadly speaking, the architecture of the generator G is a series of deconvolutional layers that transform the noise z and class c into an image. Two variants of the model architecture are trained for generating images at 128×128 and 64×64 spatial resolutions.
  • The discriminator D is a deep convolutional neural network with a Leaky ReLU nonlinearity.

2. Results

2.1. ImageNet

ImageNet Samples

2.2. Latent Space Interpolation

Latent Space Interpolation
  • For detailed results, please feel free to read the paper directly.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.