Review — CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (GAN)
Using Cycle Consistency Loss for Unpaired Image-to-Image Translation, Outperforms CoGAN, BiGAN, ALI & SimGAN
In this story, Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, (CycleGAN), by Berkeley AI Research (BAIR) laboratory, UC Berkeley, is reviewed.
For many tasks, paired training data will not be available.
In this paper:
- CycleGAN is designed to translate an image from a source domain X to a target domain Y in the absence of paired examples, i.e. G: X→Y.
- This mapping is highly under-constrained, an inverse mapping F: Y→X is coupled and a cycle consistency loss is introduced to enforce F(G(X))=X (and vice versa).
This is a paper in 2017 ICCV with over 8300 citations. (Sik-Ho Tsang @ Medium)
Outline
- Paired & Unpaired Training Data
- Cycle Consistency
- CycleGAN
- Ablation Study
- Quantitative Evaluation
- Qualitative Results
1. Paired & Unpaired Training Data
- Paired training data consists of training examples {xi, yi}, where the correspondence between xi and yi exists.
- Most of the supervised learning is applied onto the paired training data. However, obtaining paired training data can be difficult and expensive.
- Unpaired training data consists of a source set {xi} (xi∈X) and a target set {yi} (yi∈Y), with no information provided as to which xi matches which yj.
CycleGAN seeks to learn to translate between domains without paired input-output examples.
2. Cycle Consistency
- A mapping G: X→Y should be learnt such that the output ^y = G(x), x∈X, is indistinguishable from images y∈Y by an adversary trained to classify ^y apart from y.
- The optimal G thereby translates the domain X to a domain ^Y distributed identically to Y.
- Yet, there can be infinitely many mappings G. It is difficult to optimize. Standard procedures often lead to the well-known problem of mode collapse.
- A property should be exploited, i.e. translation should be “cycle consistent”.
Mathematically, if we have a translator G: X→Y and another translator F: Y→X, then G and F should be inverses of each other.
A cycle consistency loss is added that encourages F(G(x))≈x and G(F(y))≈y.
Combining this loss with adversarial losses on domains X and Y yields our full objective for unpaired image-to-image translation.
3. CycleGAN
3.1. Adversarial Loss
- For the mapping function G: X→Y and its discriminator DY:
- where G tries to generate images G(x) that look similar to images from domain Y, while DY aims to distinguish between translated samples G(x) and real samples y.
- A similar adversarial loss for the mapping function F: Y→X and its discriminator DX are introduced.
3.2. Cycle Consistency Loss
- Adversarial losses alone cannot guarantee that the learned function can map an individual input xi to a desired output yi.
- It is argued that the learned mapping functions should be cycle-consistent.
- Forward cycle consistency: For each image x from domain X, the image translation cycle should be able to bring x back to the original image, i.e., x→G(x)→F(G(x))≈x.
- Similarly for backward cycle consistency: y→F(y)→G(F(y))→y.
- The reconstructed images F(G(x)) end up matching closely to the input images x.
3.3. Full Objective
- The total loss is:
- where λ controls the relative importance of the two objectives.
- The loss to be solved:
3.4. Viewed As Autoencoder
- CycleGAN can be viewed as training two “autoencoders”: learning one autoencoder F○G: X→X jointly with another G○F: Y→Y.
- However, each have special internal structures: they map an image to itself via an intermediate representation that is a translation of the image into another domain.
- Such a setup can also be seen as a special case of “adversarial autoencoders”, which use an adversarial loss to train the bottleneck layer of an autoencoder to match an arbitrary target distribution.
- In CycleGAN, the target distribution for the X→X autoencoder is that of the domain Y.
3.5. Architecture
- The architecture for generative networks from Johnson [23] is used, which has shown impressive results for neural style transfer and super resolution.
- This network contains two stride-2 convolutions, several residual blocks, and two fractionally-strided convolutions with stride 1/2 .
- 6 blocks are used for 128×128 images and 9 blocks are used for 256×256 and higher resolution training images.
- Similar to Johnson’s, instance normalization is used.
- For the discriminator networks, 70×70 PatchGANs [22, 30, 29] are used, which aim to classify whether 70×70 overlapping image patches are real or fake. Such a patch-level discriminator architecture has fewer parameters than a full-image discriminator and can work on arbitrarily-sized images in a fully convolutional fashion.
3.6 Implementation
- The negative log likelihood objective is replaced by a least-squares loss.
- To train G, the below loss is minimized:
- To train D, the below loss is minimized:
- Discriminators are updated using a history of generated images rather than the ones produced by the latest generators. An image buffer is kept that stores the 50 previously created images.
4. Ablation Study
- Removing the GAN loss substantially degrades results, as does removing the cycle-consistency loss.
- CycleGAN with the cycle loss in only one direction is also tried. It is found that it often incurs training instability and causes mode collapse.
5. Quantitative Evaluation
- “real vs fake” perceptual studies are run on Amazon Mechanical Turk (AMT), with 25 participants.
- All the baselines almost never fooled participants.
CycleGAN can fool participants on around a quarter of trials.
- The FCN predicts a label map for a generated photo. Semantic segmentation metrics is also used.
- It is noted that Pix2Pix works on paired data, which can be treated as upper-bound performance.
In both cases, CycleGAN again outperforms the baselines: e.g. CoGAN, BiGAN, ALI and SimGAN.
6. Qualitative Results
- There are many many impressive results in the paper. Please feel free to read the paper if interested.
Reference
[2017 ICCV] [CycleGAN]
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Generative Adversarial Network (GAN)
Image Synthesis [GAN] [CGAN] [LAPGAN] [DCGAN] [CoGAN] [SimGAN] [BiGAN] [ALI]
Image-to-image Translation [Pix2Pix] [UNIT] [CycleGAN]
Super Resolution [SRGAN & SRResNet] [EnhanceNet] [ESRGAN]
Blur Detection [DMENet]
Camera Tampering Detection [Mantini’s VISAPP’19]
Video Coding [VC-LAPGAN] [Zhu TMM’20] [Zhong ELECGJ’21]