Brief Review — Generative Semantic Manipulation with Mask-Contrasting GAN
Generative Semantic Manipulation with Mask-Contrasting GAN
Mask Contrast-GAN, by Carnegie Mellon University, Sun Yat-sen University
2018 ECCV, Over 50 Citations (Sik-Ho Tsang @ Medium)Generative Adversarial Network (GAN)
Image-to-image Translation: 2017 [Pix2Pix] [UNIT] [CycleGAN] 2018 [MUNIT] [StarGAN] [pix2pixHD] [SaGAN]
==== My Other Paper Readings Are Also Over Here ====
- A contrasting GAN (contrast-GAN) with a novel adversarial contrasting objective which is able to perform all types of semantic translations with one category-conditional generator.
- Distance comparisons between samples are used for the training objective, enforcing the manipulated data be semantically closer to the real data with target category than the input data.
Outline
- Contrast-GAN
- Mask Contrast-GAN
- Results
1. Contrast-GAN
- LSGAN is used:
- The feature representation of manipulated result y′ should be closer to those of real data {y} in target domain Y than that of x in input domain X under the background of object semantic cy.
- The generator aims to minimize the contrasting distance Q(·):
- where fx, fy and fy’ are the feature embeddings for different images x, y and y’ respectively.
- The discriminator aims to maximize the contrasting distance:
- To further reduce the space of possible mapping functions, the cycle-consistency loss in CycleGAN is also used, which constrains the mappings (induced by the generator G) between two object semantics should be inverses of each other:
- Therefore, the full objective is computed:
- so that G tries to minimize this objective against a set of adversarial discriminators {Dcy} that try to maximize them:
2. Mask Contrast-GAN
- An image has objects and background. Mask is needed in order to crop out the object we want to manipulate.
- Object mask is obtained from the dataset, such as MS COCO segmentation mask.
- With the object mask obtained, a masking operation and subsequent spatial cropping operation are performed. The background image is calculated by functioning the inverse mask map on an input image.
- Then, an encoder-decoder architecture is used with input of target category cy as well.
- These target category cy using a one-hot vector which is then passed into a linear layer to get a feature embedding with 64 dimension. This feature is then replicated spatially.
- The manipulated region is wrapped back into the original image resolution, which is then combined with the background image via an additive operation to get the final manipulated image.
- Both local discriminators {Dcy} defined in the proposed contrast-GAN and a global image discriminator DI, are used.
3. Results
3.1. FCN Scores
In both cases, the proposed contrast-GAN with a new adversarial contrasting objective outperforms the state-of-the-arts on unpaired image-to-image translation.
3.2. Human Perception Test
The method substantially outperforms the baseline on all tasks.
3.3. Qualitative Results
- Mask is also applied onto CycleGAN, which is treated as a baseline.
- The baseline method often tries to translate very low-level information (e.g. color changes) and fails to edit the shapes and key characteristic (e.g. structure) that truly convey a specific high-level object semantic.
However, the proposed contrast-GAN tends to perform trivial yet critical changes on object shapes and textures to satisfy the target semantic while preserving the original object characteristics.
- Fig. 7: The original GAN networks often renders the whole image with the target texture and ignores the particular image content at different locations/regions.
Fig. 6 & Fig. 7: Mask Contrastive GAN has the promising capability of manipulating object semantics while retaining original shapes, viewpoints, and interactions with the background.