Brief Review — Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Perceptual Loss: Feature Reconstruction Loss + Style Reconstruction Loss

Comparison with Image Style Transfer [10] and SRCNN [11]
  • Perceptual loss is introduced for high-quality image style transfer.
  • This perceptual loss is used in many different domains later on. This is a paper by Li Fei-Fei research group.

Outline

  1. Perceptual Loss Network Architecture
  2. Perceptual Loss Functions
  3. Results

1. Perceptual Loss Network Architecture

System overview
  • The proposed system consists of two components:

1.1. Loss Network

  • A loss network Φ that is used to define several loss functions l1, …, lk. Each loss function computes a scalar value li(^y, yi) measuring the difference between the output image ^y and a target image yi.
  • The loss network is a frozen ImageNet-pretrained VGG-16.

1.2. Image Transformation Network

  • An image transformation network fW, which is a deep residual convolutional neural network parameterized by weights W.
  • It is trained using stochastic gradient descent to minimize a weighted combination of loss functions:
  • It consists of five residual blocks, with some modifications. (Please feel free to read the paper for detailed modifications.)

2. Perceptual Loss Functions

2.1. Feature Reconstruction Loss

  • This loss encourages them to have similar feature representations as computed by the loss network. Let Φj(x) be the activations of the jth layer of the network Φ when processing the image x.
  • The feature reconstruction loss is the (squared, normalized) Euclidean distance between feature representations:
Optimization to minimize the feature reconstruction loss

2.2. Style Reconstruction Loss

  • Similar to Image Style Transfer, the Gram matrix GΦj(x) to be the Cj×Cj matrix whose elements are given by:
  • The style reconstruction loss is then the squared Frobenius norm of the difference between the Gram matrices of the output and target images:
Optimization to minimize the style reconstruction loss

3. Results

3.1. Style Transfer

Example results of style transfer using our image transformation networks
Example results for style transfer on 512×512 images.
  • Feature reconstruction loss is computed at layer relu2_2 and style reconstruction loss is computed at layers relu1_2, relu2_2, relu3_3, and relu4_3 of the VGG-16 loss network.
  • It is clear that the trained style transfer network is aware of the semantic content of images.
  • For example in the beach image in the above figure, the people are clearly recognizable in the transformed image but the background is warped beyond recognition. Similarly in the cat image, the cat’s face is clear in the transformed image, but its body is not.
  • One explanation is that the VGG-16 loss network has features which are selective for people and animals since these objects are present in the classification dataset on which it was trained.
Inference Speed
  • The proposed method is three orders of magnitude faster than Image Style Transfer. It processes images of size 512×512 at 20 FPS, making it feasible to run style transfer in real-time or on video.

3.2. Single Image Super Resolution

Results for ×4 super-resolution
Results for ×8 super-resolution
  • The proposed method obtains lower PSNR and SSIM, but with more pleasant images since perceptual loss does not optimize PSNR/SSIM using l1/l2 loss, which has similar spirit of SRGAN.

Reference

5.2. Style Transfer

My Other Previous Paper Readings

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store