Brief Review — AdaIN: Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

Adaptive Instance Normalization (AdaIN), Later Adopted in Many Applications, such as StyleGAN

Sik-Ho Tsang
4 min readSep 8, 2023

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
AdaIN, by Cornell University
2017 ICCV, Over 3600 Citations (Sik-Ho Tsang @ Medium)

Neural Style Transfer
2016 [Artistic Style Transfer] [Image Style Transfer] [Perceptual Loss] [GAN-CLS, GAN-INT, GAN-CLS-INT] 2017 [StyleNet, Instance Norm (IN)]
==== My Other Paper Readings Are Also Over Here ====

  • Adaptive Instance Normalization (AdaIN) layer is proposed that aligns the mean and variance of the content features with those of the style features.
  • Later, AdaIN is widely adopted in many applications, such as StyleGAN.

Outline

  1. AdaIN
  2. Results

1. AdaIN

1.1. AdaIN

  • Adaptive Instance Normalization (AdaIN), a simple extension to Instance Norm (IN), is proposed.

AdaIN receives a content input x and a style input y, and simply aligns the channel-wise mean and variance of x to match those of y:

  • Unlike BN or IN, AdaIN has no learnable affine parameters. Instead, it adaptively computes the affine parameters from the style input.

1.2. Model Architecture

Model Architecture
  • The style transfer network T takes a content image c and an arbitrary style image s as inputs.
  • A simple encoder-decoder architecture, in which the encoder f is fixed to the first few layers (up to relu4_1) of a pre-trained VGG-19.

After encoding the content and style images in feature space, both feature maps are fed to an AdaIN layer that aligns the mean and variance of the content feature maps to those of the style feature maps, producing the target feature maps t:

  • A randomly initialized decoder g is trained to map t back to the image space, generating the stylized image T(c, s):
  • The loss function to train the decoder is a weighted combination of the content loss Lc and the style loss Ls:
  • where AdaIN output t is used as the content target for Lc:
  • And the style loss Ls uses φi, which denotes a layer in VGG-19:

2. Results

2.1. Qualitative Results

Qualitative Results

The quality of the stylized images is quite competitive with Improved Texture Network (Ulyanov’s) [52] and Image Style Transfer (Gatys’s) [16] for many images (e.g., row 1, 2, 3).

  • In some other cases (e.g., row 5), the proposed method is slightly behind the quality of Improved Texture Network (Ulyanov’s) [52] and Image Style Transfer (Gatys’s) [16]. This is not unexpected, as there is a three-way trade-off between speed, flexibility, and quality.

2.2. Speed Analysis

Speed Analysis

The proposed algorithm runs at 56 and 15 FPS for 256 × 256 and 512 × 512 images respectively, making it possible to process arbitrary user-uploaded styles in real-time.

Among algorithms applicable to arbitrary styles, the proposed method is nearly 3 orders of magnitude faster than Image Style Transfer (Gatys’s) [16] and 1–2 orders of magnitude faster than Chen’s [6].

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.