Brief Review — Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Perceptual Loss: Feature Reconstruction Loss + Style Reconstruction Loss

Sik-Ho Tsang
4 min readNov 10, 2022
Comparison with Image Style Transfer [10] and SRCNN [11]

Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Perceptual Loss
, by Stanford University,
2016 ECCV, Over 7000 Citations (Sik-Ho Tsang @ Medium)
Image Style Transfer, Super Resolution

  • Perceptual loss is introduced for high-quality image style transfer.
  • This perceptual loss is used in many different domains later on. This is a paper by Li Fei-Fei research group.

Outline

  1. Perceptual Loss Network Architecture
  2. Perceptual Loss Functions
  3. Results

1. Perceptual Loss Network Architecture

System overview
  • The proposed system consists of two components:

1.1. Loss Network

  • A loss network Φ that is used to define several loss functions l1, …, lk. Each loss function computes a scalar value li(^y, yi) measuring the difference between the output image ^y and a target image yi.
  • The loss network is a frozen ImageNet-pretrained VGG-16.

The loss network  is used to define a feature reconstruction loss lΦfeat and a style reconstruction loss lΦstyle that measure differences in content and style between images.

1.2. Image Transformation Network

  • An image transformation network fW, which is a deep residual convolutional neural network parameterized by weights W.
  • It is trained using stochastic gradient descent to minimize a weighted combination of loss functions:
  • It consists of five residual blocks, with some modifications. (Please feel free to read the paper for detailed modifications.)

2. Perceptual Loss Functions

2.1. Feature Reconstruction Loss

  • This loss encourages them to have similar feature representations as computed by the loss network. Let Φj(x) be the activations of the jth layer of the network Φ when processing the image x.
  • The feature reconstruction loss is the (squared, normalized) Euclidean distance between feature representations:
Optimization to minimize the feature reconstruction loss

As images are reconstructed from higher layers, image content and overall spatial structure are preserved but color, texture, and exact shape are not.

2.2. Style Reconstruction Loss

  • Similar to Image Style Transfer, the Gram matrix GΦj(x) to be the Cj×Cj matrix whose elements are given by:
  • The style reconstruction loss is then the squared Frobenius norm of the difference between the Gram matrices of the output and target images:
Optimization to minimize the style reconstruction loss

This loss preserves stylistic features from the target image, but does not preserve its spatial structure.

3. Results

3.1. Style Transfer

Example results of style transfer using our image transformation networks
Example results for style transfer on 512×512 images.
  • Feature reconstruction loss is computed at layer relu2_2 and style reconstruction loss is computed at layers relu1_2, relu2_2, relu3_3, and relu4_3 of the VGG-16 loss network.
  • It is clear that the trained style transfer network is aware of the semantic content of images.
  • For example in the beach image in the above figure, the people are clearly recognizable in the transformed image but the background is warped beyond recognition. Similarly in the cat image, the cat’s face is clear in the transformed image, but its body is not.
  • One explanation is that the VGG-16 loss network has features which are selective for people and animals since these objects are present in the classification dataset on which it was trained.
Inference Speed
  • The proposed method is three orders of magnitude faster than Image Style Transfer. It processes images of size 512×512 at 20 FPS, making it feasible to run style transfer in real-time or on video.

3.2. Single Image Super Resolution

Results for ×4 super-resolution
Results for ×8 super-resolution
  • The proposed method obtains lower PSNR and SSIM, but with more pleasant images since perceptual loss does not optimize PSNR/SSIM using l1/l2 loss, which has similar spirit of SRGAN.

Reference

[2016 ECCV] [Perceptual Loss]
Perceptual Losses for Real-Time Style Transfer and Super-Resolution

5.2. Style Transfer

2016 [Image Style Transfer] [Perceptual Loss]

My Other Previous Paper Readings

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.