Brief Review — Generative Semantic Manipulation with Mask-Contrasting GAN

Mask Contrast-GAN, Outperforms CoGAN, BiGAN, CycleGAN

Sik-Ho Tsang
4 min readSep 11, 2023
Visualizations: Source and Target Have Different Shapes
More Visualizations

Generative Semantic Manipulation with Mask-Contrasting GAN
Mask Contrast-GAN, by Carnegie Mellon University, Sun Yat-sen University
2018 ECCV, Over 50 Citations (Sik-Ho Tsang @ Medium)

Generative Adversarial Network (GAN)
Image-to-image Translation: 2017 [Pix2Pix] [UNIT] [CycleGAN] 2018 [MUNIT] [StarGAN] [pix2pixHD] [SaGAN]
==== My Other Paper Readings Are Also Over Here ====

  • A contrasting GAN (contrast-GAN) with a novel adversarial contrasting objective which is able to perform all types of semantic translations with one category-conditional generator.
  • Distance comparisons between samples are used for the training objective, enforcing the manipulated data be semantically closer to the real data with target category than the input data.

Outline

  1. Contrast-GAN
  2. Mask Contrast-GAN
  3. Results

1. Contrast-GAN

Contrast-GAN Overview
  • The feature representation of manipulated result y should be closer to those of real data {y} in target domain Y than that of x in input domain X under the background of object semantic cy.
  • The generator aims to minimize the contrasting distance Q(·):
  • where fx, fy and fy’ are the feature embeddings for different images x, y and y’ respectively.
  • The discriminator aims to maximize the contrasting distance:
  • To further reduce the space of possible mapping functions, the cycle-consistency loss in CycleGAN is also used, which constrains the mappings (induced by the generator G) between two object semantics should be inverses of each other:
  • Therefore, the full objective is computed:
  • so that G tries to minimize this objective against a set of adversarial discriminators {Dcy} that try to maximize them:

2. Mask Contrast-GAN

Mask Contrast-GAN
  • An image has objects and background. Mask is needed in order to crop out the object we want to manipulate.
  • Object mask is obtained from the dataset, such as MS COCO segmentation mask.
  • With the object mask obtained, a masking operation and subsequent spatial cropping operation are performed. The background image is calculated by functioning the inverse mask map on an input image.
  • Then, an encoder-decoder architecture is used with input of target category cy as well.
  • These target category cy using a one-hot vector which is then passed into a linear layer to get a feature embedding with 64 dimension. This feature is then replicated spatially.
  • The manipulated region is wrapped back into the original image resolution, which is then combined with the background image via an additive operation to get the final manipulated image.
  • Both local discriminators {Dcy} defined in the proposed contrast-GAN and a global image discriminator DI, are used.

3. Results

3.1. FCN Scores

Labels to Photos
Photos to Labels

In both cases, the proposed contrast-GAN with a new adversarial contrasting objective outperforms the state-of-the-arts on unpaired image-to-image translation.

3.2. Human Perception Test

Human Perception Test

The method substantially outperforms the baseline on all tasks.

3.3. Qualitative Results

  • Mask is also applied onto CycleGAN, which is treated as a baseline.
  • The baseline method often tries to translate very low-level information (e.g. color changes) and fails to edit the shapes and key characteristic (e.g. structure) that truly convey a specific high-level object semantic.

However, the proposed contrast-GAN tends to perform trivial yet critical changes on object shapes and textures to satisfy the target semantic while preserving the original object characteristics.

  • Fig. 7: The original GAN networks often renders the whole image with the target texture and ignores the particular image content at different locations/regions.

Fig. 6 & Fig. 7: Mask Contrastive GAN has the promising capability of manipulating object semantics while retaining original shapes, viewpoints, and interactions with the background.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.