Brief Review — Texture Networks: Feed-forward Synthesis of Textures and Stylized Images
Texture Networks: Feed-forward Synthesis of Textures and Stylized Images
Texture Network, by Computer Vision Group, Skoltech & Yandex; Visual Geometry Group, University of Oxford
2016 ICML, Over 1000 Citations (Sik-Ho Tsang @ Medium)
- Texture Network
1. Texture Network
The aim isto train a feed-forward generator network g which takes a noise sample z as input and produces a texture sample g(z) as output.
For style transfer, this texture network is extended to take both a noise sample z and a content image y and then output a new image g(y, z) where the texture has been applied to y as a visual style.
1.1. Texture Synthesis
- The Gram matrix Gl(x) is defined as the matrix of scalar (inner) products between feature maps at l-th layer:
- Texture loss is the Gram matrix loss between images x and x0:
- where x is input image, and x0 is the reference texture instance.
Multi-scale architectures are used, which results in images with smaller texture loss and better perceptual quality while using fewer parameters and training faster.
- As shown above, the input noise z comprises K random tensors zi. Each random noise tensor is first processed by a sequence of convolutional and non-linear activation layers, then upsampled by a factor of two, and finally concatenated. K=5.
- Each conv block contains three convolutional layers (3×3, 3×3, 1×1), with ReLU used. Nearest neighbor is used for upsampling. Batch norm is used.
- The last full-resolution tensor is ultimately mapped to an RGB image x by a bank of 1×1 filters.
- Only texture loss is minimized:
1.2. Style Transfer
The architecture is the same as the one used for texture synthesis but the noise tensors zi are concatenated with downsampled versions of the input image y. K=6.
- Content loss is used, which compares feature activations at corresponding spatial locations:
- The total loss is the combination of content and texture loss for minimization:
2.1. Texture Synthesis
Qualitatively, the proposed Texture Network and Image Style Transfer’s results are comparable and superior to the other methods. However, the proposed Texture Network is much more efficient than Image Style Transfer.
2.2. Style Transfer
Trade-off can still be adjusted by changing the magnitude of the input noise z.
- Iterative optimization of Image Style Transfer requires about 10 seconds to generate a sample x.
The generated images of 256×256 resolution by the proposed Texture Network are computed in about 20 milliseconds each, being 500 times speed up. And Texture Network uses significantly less memory (170 MB to generate a 256×256 sample, vs 1100 MB of Image Style Transfer.