Brief Review — Texture Networks: Feed-forward Synthesis of Textures and Stylized Images

Texture Network, Hundreds Time Faster

Sik-Ho Tsang
4 min readSep 2, 2023
Texture Networks, Similar Perceptual Quality as Image Style Transfer But Much Faster

Texture Networks: Feed-forward Synthesis of Textures and Stylized Images
Texture Network
, by Computer Vision Group, Skoltech & Yandex; Visual Geometry Group, University of Oxford
2016 ICML, Over 1000 Citations (Sik-Ho Tsang @ Medium)

Style Transfer
2016 [Artistic Style Transfer] [Image Style Transfer] [Perceptual Loss] [GAN-CLS, GAN-INT, GAN-CLS-INT]
==== My Other Paper Readings Are Also Over Here ====

  • Artistic Style Transfer/Image Style Transfer require a slow and memory-consuming optimization process.
  • In this paper, Texture Network is proposed, which is an alternative approach that moves the computational burden to a learning stage. It is light-weight and hundreds of times faster.

Outline

  1. Texture Network
  2. Results

1. Texture Network

Texture Network

The aim isto train a feed-forward generator network g which takes a noise sample z as input and produces a texture sample g(z) as output.

For style transfer, this texture network is extended to take both a noise sample z and a content image y and then output a new image g(y, z) where the texture has been applied to y as a visual style.

1.1. Texture Synthesis

  • The Gram matrix Gl(x) is defined as the matrix of scalar (inner) products between feature maps at l-th layer:
  • Texture loss is the Gram matrix loss between images x and x0:
  • where x is input image, and x0 is the reference texture instance.

Multi-scale architectures are used, which results in images with smaller texture loss and better perceptual quality while using fewer parameters and training faster.

  • As shown above, the input noise z comprises K random tensors zi. Each random noise tensor is first processed by a sequence of convolutional and non-linear activation layers, then upsampled by a factor of two, and finally concatenated. K=5.
  • Each conv block contains three convolutional layers (3×3, 3×3, 1×1), with ReLU used. Nearest neighbor is used for upsampling. Batch norm is used.
  • The last full-resolution tensor is ultimately mapped to an RGB image x by a bank of 1×1 filters.
  • Only texture loss is minimized:

1.2. Style Transfer

The architecture is the same as the one used for texture synthesis but the noise tensors zi are concatenated with downsampled versions of the input image y. K=6.

  • Content loss is used, which compares feature activations at corresponding spatial locations:
  • The total loss is the combination of content and texture loss for minimization:

2. Results

2.1. Texture Synthesis

Texture Synthesis Examples

Qualitatively, the proposed Texture Network and Image Style Transfer’s results are comparable and superior to the other methods. However, the proposed Texture Network is much more efficient than Image Style Transfer.

2.2. Style Transfer

Scaling Input Noise Examples

Trade-off can still be adjusted by changing the magnitude of the input noise z.

Style Transfer Examples
Style Transfer Examples

The generated images of 256×256 resolution by the proposed Texture Network are computed in about 20 milliseconds each, being 500 times speed up. And Texture Network uses significantly less memory (170 MB to generate a 256×256 sample, vs 1100 MB of Image Style Transfer.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.