Brief Review — StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

StarGAN, Single Generator for Multiple Domains

5 min readAug 15, 2023

**Multi-Domain Image-to-Image Translation**

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
StarGAN, by Korea University, Clova AI Research, NAVER Corp., The College of New Jersey, and Hong Kong University of Science & Technology
2018 CVPR, Over 3400 Citations (Sik-Ho Tsang @ Medium)
Image-to-image Translation: 2017 [Pix2Pix] [UNIT] [CycleGAN] 2018 [MUNIT]
==== My Other Paper Readings Are Also Over Here ====

StarGAN, a novel and scalable approach, is proposed that can perform image-to-image translations for multiple domains using only a single model.

Outline

StarGAN
Model Architecture
Results

1. StarGAN

1.1. Conceptual Idea

**Left: Conventional Image-to-Image GANs, Right: StarGAN**

StarGAN (Right) only uses one generator for multiple domains rather than multiple generators for multiple domains (Left).

1.2. StarGAN Loss Functions

The goal is to train G to translate an input image x into an output image y conditioned on the target domain label c, G(x, c) → y.

The discriminator D produces probability distributions over both sources and domain labels, D : x → {Dsrc(x), Dcls(x)}.

1.2.1. Adversarial Loss

An adversarial loss is:

where Dsrc(x) is termed as a probability distribution over sources given by D.

1.2.2. Domain Classification Loss

For a given input image x and a target domain label c, the goal is to translate x into an output image y, which is properly classified to the target domain c.
The domain classification loss is added.
The objective is decomposed into two terms: a domain classification loss of real images used to optimize D, and a domain classification loss of fake images used to optimize G. In detail, the former is defined as:

where the term Dcls(c’|x) represents a probability distribution over domain labels computed by D. By minimizing this objective, D learns to classify a real image x to its corresponding original domain c’.
On the other hand, the loss function for the domain classification of fake images is defined as:

G tries to minimize this objective to generate images that can be classified as the target domain c.

1.2.3. Reconstruction Loss

A cycle consistency loss, similar to CycleGAN, is added:

where G takes in the translated image G(x, c) and the original domain label c’ as input and tries to reconstruct the original image x.
L1 norm is used.

1.2.4. Full Objective

The full objective is:

where λcls = 1 and λrec = 10.

1.2.5. Mask Vector

**Illustrative Example for Mask Vectors**

A mask vector m that allows StarGAN to ignore unspecified labels and focus on the explicitly known label provided by a particular dataset.

In StarGAN, an n-dimensional one-hot vector is used to represent m, with n being the number of datasets.

Illustrative example for mask vectors is as shown above.

1.2.6. Stablize Training

WGAN-GP loss is further added to stablize the training:

2. Model Architecture

The generator is:

The discriminator is:

Instance normalization is used for the generator but no normalization for the discriminator.
PatchGANs are leveraged for the discriminator network.

3. Results

3.1. Qualitative Results

StarGAN provides a higher visual quality of translation results on test data compared to the cross-domain models.

StarGAN clearly generates the most natural-looking expressions while properly maintaining the personal identity and facial features of the input.

To distinguish between the model trained only on RaFD and the model trained on both CelebA and RaFD, the former is denoted as StarGAN-SNG (single) and the latter is denoted as StarGAN-JNT (joint).

By utilizing both CelebA and RaFD, StarGAN-JNT can improve these low-level tasks, which is beneficial to learning facial expression synthesis.