# Review — CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (GAN)

**Using Cycle Consistency Loss for **Unpaired Image-to-Image Translation, Outperforms CoGAN, BiGAN, ALI & SimGAN

In this story, **Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks**, (CycleGAN), by Berkeley AI Research (BAIR) laboratory, UC Berkeley, is reviewed.

For many tasks, paired training data will not be available.

In this paper:

- CycleGAN is designed to
**translate an image from a source domain***X*to a target domain*Y*in the absence of paired examples, i.e.*G*:*X*→*Y*. - This mapping is highly under-constrained,
**an inverse mapping**is coupled and a*F*:*Y*→*X***cycle consistency loss**is introduced to**enforce**(and vice versa).*F*(*G*(*X*))=*X*

This is a paper in **2017 ICCV **with over **8300 citations**. (Sik-Ho Tsang @ Medium)

# Outline

**Paired & Unpaired Training Data****Cycle Consistency****CycleGAN****Ablation Study****Quantitative Evaluation****Qualitative Results**

# 1. Paired & Unpaired Training Data

**Paired training data**consists of training examples {*xi*,*yi*}, where the correspondence between*xi*and*yi*exists.- Most of the supervised learning is applied onto the paired training data.
**However, obtaining paired training data can be difficult and expensive.** **Unpaired training data**consists of a source set {*xi*} (*xi*∈*X*) and a target set {y*i*} (*yi*∈*Y*), with no information provided as to which*xi*matches which*yj*.

CycleGAN seeks to learn to translate between domains without paired input-output examples.

# 2. Cycle **Consistency**

**A mapping**such that the output ^*G*:*X*→*Y*should be learnt*y*=*G*(*x*),*x*∈*X*, is indistinguishable from images*y*∈*Y*by an adversary trained to classify ^*y*apart from*y*.- The optimal
*G*thereby translates the domain*X*to a domain ^*Y*distributed identically to*Y*. - Yet, there can be infinitely many mappings
*G*. It is difficult to optimize. Standard procedures often lead to the well-known problem of mode collapse. - A property should be exploited, i.e.
**translation should be “cycle consistent”.**

Mathematically, if we have a translator

and another translatorG:X→Y, then G and F should be inverses of each other.F:Y→X

A cycle consistency lossis added thatencourages.F(G(x))≈xandG(F(y))≈yCombining this loss with adversarial losses on domains X and Y yields our full objective for unpaired image-to-image translation.

# 3. **CycleGAN**

## 3.1. Adversarial Loss

- For the mapping function
and its discriminator*G*:*X*→*Y*:*DY*

- where
, while*G*tries to generate images*G*(*x*) that look similar to images from domain*Y*between translated samples*DY*aims to distinguish*G*(*x*) and real samples*y*. - A similar adversarial loss for the mapping function
and its discriminator*F*:*Y*→*X*are introduced.*DX*

## 3.2. Cycle Consistency Loss

- Adversarial losses alone cannot guarantee that the learned function can map an individual input
*xi*to a desired output*yi*. - It is argued that the learned mapping functions should be cycle-consistent.
**Forward cycle consistency**: For each image*x*from domain*X*, the image translation cycle should be able to bring*x*back to the original image, i.e.,*x***→***G*(*x*)**→***F*(*G*(*x*))≈*x*.- Similarly for
**backward cycle consistency**:*y***→***F*(*y*)**→***G*(*F*(*y*))**→***y*.

- The reconstructed images
*F*(*G*(*x*)) end up matching closely to the input images*x*.

## 3.3. Full Objective

- The total loss is:

- where
*λ*controls the relative importance of the two objectives. - The loss to be solved:

## 3.4. Viewed As Autoencoder

- CycleGAN can be viewed as training two “autoencoders”: learning one autoencoder
*F*○*G***:**jointly with another G○*X*→*X**F*:*Y***→***Y*. - However, each have special internal structures: they map an image to itself via an intermediate representation that is a translation of the image into another domain.
- Such a setup can also be seen as a special case of “adversarial autoencoders”, which use an adversarial loss to train the bottleneck layer of an autoencoder to match an arbitrary target distribution.
- In CycleGAN, the target distribution for the
*X*→*X*autoencoder is that of the domain*Y*.

## 3.5. Architecture

- The architecture for
**generative networks from Johnson [23] is used**, which has shown impressive results for neural style transfer and super resolution. - This network contains two stride-2 convolutions, several residual blocks, and two fractionally-strided convolutions with stride 1/2 .
- 6 blocks are used for 128×128 images and 9 blocks are used for 256×256 and higher resolution training images.
- Similar to Johnson’s, instance normalization is used.
- For the
**discriminator**networks,**70×70 PatchGANs**[22, 30, 29] are used, which aim to**classify whether 70×70 overlapping image patches are real or fake**. Such a patch-level discriminator architecture has fewer parameters than a full-image discriminator and can work on arbitrarily-sized images in a fully convolutional fashion.

## 3.6 Implementation

- The negative log likelihood objective is replaced by a
**least-squares loss**. - To train
*G*, the below loss is minimized:

- To train
*D*, the below loss is minimized:

- Discriminators are updated using a history of generated images rather than the ones produced by the latest generators. An image buffer is kept that stores the 50 previously created images.

# 4. Ablation Study

- Removing the GAN loss substantially degrades results, as does removing the cycle-consistency loss.
- CycleGAN with the cycle loss in only one direction is also tried. It is found that it often incurs training instability and causes mode collapse.

# 5. Quantitative Evaluation

- “real vs fake” perceptual studies are run on Amazon Mechanical Turk (AMT), with 25 participants.
- All the baselines almost never fooled participants.

CycleGAN can fool participants on around a quarter of trials.

- The FCN predicts a label map for a generated photo. Semantic segmentation metrics is also used.
- It is noted that
**Pix2Pix****works on paired data**, which can be treated as**upper-bound performance.**

In both cases, CycleGAN again outperforms the baselines: e.g. CoGAN, BiGAN, ALI and SimGAN.

# 6. **Qualitative Results**

- There are many many impressive results in the paper. Please feel free to read the paper if interested.

## Reference

[2017 ICCV] [CycleGAN]

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

## Generative Adversarial Network (GAN)

**Image Synthesis** [GAN] [CGAN] [LAPGAN] [DCGAN] [CoGAN] [SimGAN] [BiGAN] [ALI]**Image-to-image Translation **[Pix2Pix] [UNIT] [CycleGAN]**Super Resolution** [SRGAN & SRResNet] [EnhanceNet] [ESRGAN]**Blur Detection** [DMENet]**Camera Tampering Detection **[Mantini’s VISAPP’19]**Video Coding** [VC-LAPGAN] [Zhu TMM’20] [Zhong ELECGJ’21]