Review — CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (GAN)

Using Cycle Consistency Loss for Unpaired Image-to-Image Translation, Outperforms CoGAN, BiGAN, ALI & SimGAN

CycleGAN learns to automatically “translate” an image from one into the other
  • This mapping is highly under-constrained, an inverse mapping F: YX is coupled and a cycle consistency loss is introduced to enforce F(G(X))=X (and vice versa).

Outline

  1. Paired & Unpaired Training Data
  2. Cycle Consistency
  3. CycleGAN
  4. Ablation Study
  5. Quantitative Evaluation
  6. Qualitative Results

1. Paired & Unpaired Training Data

Paired training data (left) and Unpaired training data (right)
  • Most of the supervised learning is applied onto the paired training data. However, obtaining paired training data can be difficult and expensive.
  • Unpaired training data consists of a source set {xi} (xiX) and a target set {yi} (yiY), with no information provided as to which xi matches which yj.

2. Cycle Consistency

  • A mapping G: XY should be learnt such that the output ^y = G(x), xX, is indistinguishable from images yY by an adversary trained to classify ^y apart from y.
  • The optimal G thereby translates the domain X to a domain ^Y distributed identically to Y.
  • Yet, there can be infinitely many mappings G. It is difficult to optimize. Standard procedures often lead to the well-known problem of mode collapse.
  • A property should be exploited, i.e. translation should be “cycle consistent”.

3. CycleGAN

(a) CycleGAN containing two mapping functions G and F, (b) forward cycle-consistency loss, (c) backward cycle-consistency loss

3.1. Adversarial Loss

  • For the mapping function G: XY and its discriminator DY:
  • A similar adversarial loss for the mapping function F: YX and its discriminator DX are introduced.

3.2. Cycle Consistency Loss

  • Adversarial losses alone cannot guarantee that the learned function can map an individual input xi to a desired output yi.
  • It is argued that the learned mapping functions should be cycle-consistent.
  • Forward cycle consistency: For each image x from domain X, the image translation cycle should be able to bring x back to the original image, i.e., xG(x)F(G(x))≈x.
  • Similarly for backward cycle consistency: yF(y)G(F(y))y.
The input images x, output images G(x) and the reconstructed images F(G(x))

3.3. Full Objective

  • The total loss is:
  • The loss to be solved:

3.4. Viewed As Autoencoder

  • CycleGAN can be viewed as training two “autoencoders”: learning one autoencoder FG: XX jointly with another GF: YY.
  • However, each have special internal structures: they map an image to itself via an intermediate representation that is a translation of the image into another domain.
  • Such a setup can also be seen as a special case of “adversarial autoencoders”, which use an adversarial loss to train the bottleneck layer of an autoencoder to match an arbitrary target distribution.
  • In CycleGAN, the target distribution for the XX autoencoder is that of the domain Y.

3.5. Architecture

  • The architecture for generative networks from Johnson [23] is used, which has shown impressive results for neural style transfer and super resolution.
  • This network contains two stride-2 convolutions, several residual blocks, and two fractionally-strided convolutions with stride 1/2 .
  • 6 blocks are used for 128×128 images and 9 blocks are used for 256×256 and higher resolution training images.
  • Similar to Johnson’s, instance normalization is used.
  • For the discriminator networks, 70×70 PatchGANs [22, 30, 29] are used, which aim to classify whether 70×70 overlapping image patches are real or fake. Such a patch-level discriminator architecture has fewer parameters than a full-image discriminator and can work on arbitrarily-sized images in a fully convolutional fashion.

3.6 Implementation

  • The negative log likelihood objective is replaced by a least-squares loss.
  • To train G, the below loss is minimized:

4. Ablation Study

FCN-scores for different CycleGAN variants on Cityscapes labels→photo
Classification performance of photo→labels for different CycleGAN variants
  • CycleGAN with the cycle loss in only one direction is also tried. It is found that it often incurs training instability and causes mode collapse.
Different variants of our method for mapping labels↔photos trained on cityscapes
  • GAN alone and GAN+forward suffer from mode collapse, producing identical label maps regardless of the input photo.

5. Quantitative Evaluation

AMT “real vs fake” test on maps→aerial photos
  • All the baselines almost never fooled participants.
FCN-scores for different methods, evaluated on Cityscapes labels→photo.
Classification performance of photo→labels for different methods on cityscapes
  • It is noted that Pix2Pix works on paired data, which can be treated as upper-bound performance.

6. Qualitative Results

Different methods for mapping labels→photos trained on Cityscapes images.
Different methods for mapping aerial photos↔maps on Google Maps
Collection style transfer
Other translation problems

Reference

[2017 ICCV] [CycleGAN]
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Generative Adversarial Network (GAN)

Image Synthesis [GAN] [CGAN] [LAPGAN] [DCGAN] [CoGAN] [SimGAN] [BiGAN] [ALI]
Image-to-image Translation [Pix2Pix] [UNIT] [CycleGAN]
Super Resolution [SRGAN & SRResNet] [EnhanceNet] [ESRGAN]
Blur Detection [DMENet]
Camera Tampering Detection [Mantini’s VISAPP’19]
Video Coding
[VC-LAPGAN] [Zhu TMM’20] [Zhong ELECGJ’21]

My Other Previous Paper Readings

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG