Brief Review — AC-GAN: Conditional Image Synthesis With Auxiliary Classifier GANs

AC-GAN, Conditioned on Both Noise Vector and Class Label

3 min readAug 13, 2023

**ImageNet Samples Generated by AC-GAN**

Conditional Image Synthesis With Auxiliary Classifier GANs
AC-GAN, by Google Brain
2017 ICML, Over 3400 Citations (Sik-Ho Tsang @ Medium)
Generative Adversarial Network (GAN)
Image Synthesis: 2014 … 2019 [SAGAN]
==== My Other Paper Readings Are Also Over Here ====

Auxiliary Classifier GAN (AC-GAN), is proposed, which employs label conditioning that results in 128×128 resolution image samples exhibiting global coherence.

Outline

Auxiliary Classifier GAN (AC-GAN)
Results

1. Auxiliary Classifier GAN (AC-GAN)

1.1. Loss Functions

In AC-GAN, every generated sample has a corresponding class label, c ~ pc in addition to the noise z. The generator G uses both to generate images Xfake = G(c, z).

The discriminator gives both a probability distribution over sources and a probability distribution over the class labels, P(S | X), P(C | X) = D(X).
The objective function has two parts: the loglikelihood of the correct source, LS, and the log-likelihood of the correct class, LC.

D is trained to maximize LS + LC while G is trained to maximize LC - LS. AC-GANs learn a representation for z that is independent of class label.

Structurally, this model is not tremendously different from existing models. However, this modification to the standard GAN formulation produces excellent results and appears to stabilize training.

1.2. Model

The structure of the AC-GAN model permits separating large datasets into subsets by class and training a generator and discriminator for each subset. All ImageNet experiments are conducted using an ensemble of 100 AC-GANs, each trained on a 10-class split, 50000 mini-batches of size 100.
Broadly speaking, the architecture of the generator G is a series of deconvolutional layers that transform the noise z and class c into an image. Two variants of the model architecture are trained for generating images at 128×128 and 64×64 spatial resolutions.
The discriminator D is a deep convolutional neural network with a Leaky ReLU nonlinearity.