Brief Review — StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

StarGAN, Single Generator for Multiple Domains

Sik-Ho Tsang
5 min readAug 15, 2023
Multi-Domain Image-to-Image Translation

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
StarGAN, by Korea University, Clova AI Research, NAVER Corp., The College of New Jersey, and Hong Kong University of Science & Technology
2018 CVPR, Over 3400 Citations (Sik-Ho Tsang @ Medium)

Image-to-image Translation: 2017 [Pix2Pix] [UNIT] [CycleGAN] 2018 [MUNIT]
==== My Other Paper Readings Are Also Over Here ====

  • StarGAN, a novel and scalable approach, is proposed that can perform image-to-image translations for multiple domains using only a single model.

Outline

  1. StarGAN
  2. Model Architecture
  3. Results

1. StarGAN

1.1. Conceptual Idea

Left: Conventional Image-to-Image GANs, Right: StarGAN

StarGAN (Right) only uses one generator for multiple domains rather than multiple generators for multiple domains (Left).

1.2. StarGAN Loss Functions

StarGAN Overview

The goal is to train G to translate an input image x into an output image y conditioned on the target domain label c, G(x, c) → y.

  • The discriminator D produces probability distributions over both sources and domain labels, D : x → {Dsrc(x), Dcls(x)}.

1.2.1. Adversarial Loss

  • An adversarial loss is:
  • where Dsrc(x) is termed as a probability distribution over sources given by D.

1.2.2. Domain Classification Loss

  • For a given input image x and a target domain label c, the goal is to translate x into an output image y, which is properly classified to the target domain c.
  • The domain classification loss is added.
  • The objective is decomposed into two terms: a domain classification loss of real images used to optimize D, and a domain classification loss of fake images used to optimize G. In detail, the former is defined as:
  • where the term Dcls(c’|x) represents a probability distribution over domain labels computed by D. By minimizing this objective, D learns to classify a real image x to its corresponding original domain c’.
  • On the other hand, the loss function for the domain classification of fake images is defined as:
  • G tries to minimize this objective to generate images that can be classified as the target domain c.

1.2.3. Reconstruction Loss

  • A cycle consistency loss, similar to CycleGAN, is added:
  • where G takes in the translated image G(x, c) and the original domain label c’ as input and tries to reconstruct the original image x.
  • L1 norm is used.

1.2.4. Full Objective

  • The full objective is:
  • where λcls = 1 and λrec = 10.

1.2.5. Mask Vector

Illustrative Example for Mask Vectors

A mask vector m that allows StarGAN to ignore unspecified labels and focus on the explicitly known label provided by a particular dataset.

  • In StarGAN, an n-dimensional one-hot vector is used to represent m, with n being the number of datasets.
  • Illustrative example for mask vectors is as shown above.

1.2.6. Stablize Training

  • WGAN-GP loss is further added to stablize the training:

2. Model Architecture

  • The generator is:
Generator
  • The discriminator is:
Discriminator
  • Instance normalization is used for the generator but no normalization for the discriminator.
  • PatchGANs are leveraged for the discriminator network.

3. Results

3.1. Qualitative Results

CelebA

StarGAN provides a higher visual quality of translation results on test data compared to the cross-domain models.

RaFD

StarGAN clearly generates the most natural-looking expressions while properly maintaining the personal identity and facial features of the input.

CelebA + RaFD
  • To distinguish between the model trained only on RaFD and the model trained on both CelebA and RaFD, the former is denoted as StarGAN-SNG (single) and the latter is denoted as StarGAN-JNT (joint).

By utilizing both CelebA and RaFD, StarGAN-JNT can improve these low-level tasks, which is beneficial to learning facial expression synthesis.

3.2. Quantitative Results

  • Amazon Mechanical Turk (AMT) is used to assess single and multiple attribute transfer tasks.

StarGAN obtained the majority of votes for best transferring attributes in all cases.

Model Size

StarGAN is 7 times smaller than that of DIAT and 14 times smaller than that of CycleGAN.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.