Review: SRGAN & SRResNet — Photo-Realistic Super Resolution (GAN & Super Resolution)

Using Generative Adversarial Network (GAN), Lower PSNR But with More Photo-Realistic Super-Resolved Image Can Be Obtained

6 min readApr 22, 2020

**Super-resolved image (left) is almost indistinguishable from original (right). (4× upscaling)**

In this paper, a generative adversarial network (GAN) for image super-resolution (SR), SRGAN, by Twitter, is reviewed. The network wihout using GAN is SRResNet. Super-resolved images obtain high peak signal-to-noise ratios (PSNRs), but they are often lacking high-frequency details and are perceptually unsatisfying. One of the reason is that they are using MSE as loss function.

**Deep residual generative adversarial network (SRGAN) optimized for a loss more sensitive to human perception**

SRGAN is the first framework capable of inferring photo-realistic natural images for 4× upscaling factors.
A perceptual loss function which consists of an adversarial loss and a content loss is proposed for SR, which uses the high-level feature maps of VGG network, more invariant to changes in pixel space.
SRResNet also obtained new state-of-the-art results in terms of PSNR and SSIM at that moment.
An extensive mean opinion score (MOS) test on images from three public benchmark datasets also shows SRGAN was new state-of-the-art approach by large margin at that moment.

This is a paper in 2017 CVPR with over 3300 citations. (Sik-Ho Tsang @ Medium)

Outline

SR Formulation
Adversarial Network Architecture
Perceptual Loss Function
Experimental Results

1. SR Formulation

In Single Image Super Resolution (SISR) the aim is to estimate a high-resolution, super-resolved image I^SR from a low-resolution input image ILR.
Here ILR is the low-resolution version of its high-resolution counterpart IHR. The high-resolution images are only available during training.
In training, ILR is obtained by applying a Gaussian filter to IHR followed by a downsampling operation with downsampling factor r.
For an image with C color channels, ILR is described by a real-valued tensor of size W×H×C and IHR, ISR by rW×rH×C respectively.

2. Adversarial Network Architecture

The general idea behind this formulation is that it allows one to train a generative model G with the goal of fooling a differentiable discriminator D that is trained to distinguish super-resolved images from real images.
With this approach, the generator can learn to create solutions that are highly similar to real images and thus difficult to classified by D.
The min-max problem in order to train D and G:

(For more information, please read GAN if interested.)

2.1. Generator Network G

There are B residual blocks (B=16), originated by ResNet.
Within the residual block, two convolutional layers are used, with small 3×3 kernels and 64 feature maps followed by batch-normalization layers and ParametricReLU as the activation function.
The resolution of the input image is increased with two trained sub-pixel convolution layers.

2.2. Discriminator Network D

LeakyReLU activation (α=0.2) and avoid max-pooling throughout the network.
The discriminator network is trained to solve the maximization problem.
The network contains eight convolutional layers with an increasing number of 3×3 filter kernels, increasing by a factor of 2 from 64 to 512 kernels as in the VGG network.
Strided convolutions are used to reduce the image resolution each time the number of features is doubled.
The resulting 512 feature maps are followed by two dense layers and a final sigmoid activation function to obtain a probability for sample classification.

3. Perceptual Loss Function

The perceptual loss as the weighted sum of a content loss (lSRX) and an adversarial loss component:

3.1. Content Loss

Instead of using MSE loss:

Instead of relying on pixel-wise losses, SRGAN uses the VGG loss:

Φi,j indicates the feature map obtained by the j-th convolution (after activation) before the i-th maxpooling layer within the VGG19 network.
Wi,j and Hi,j describe the dimensions of the respective feature maps.
This VGG loss is the euclidean distance between the feature representations of a reconstructed image GθG(ILR) and the reference image IHR.

The content loss is motivated by perceptual similarity instead of similarity in pixel space.

3.2. Adversarial Loss

The generative loss lSRGen is defined based on the probabilities of the discriminator DθD(GθG(ILR)) over all training samples:

DθD(GθG(ILR)) is the probability that the reconstructed image GG(ILR) is a natural HR image.

The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.

4. Experimental Results

Three datasets are tested: Set5, Set14 and BSD100, with a scale factor of 4×, i.e. 16× reduction in image pixels.
All networks are trained on a NVIDIA Tesla M40 GPU using a random sample of 350 thousand images from the ImageNet database.
MSE-based SRResNet network is trained as initialization for the generator when training the actual GAN to avoid undesired local optima.
The generator and discriminator networks are alternate updated, which is equivalent to k = 1 as in GAN.
During test time batch-normalization update is off to obtain an output that deterministically depends only on the input.

4.1. Mean Opinion Score (MOS) Testing

26 raters are asked to assign an integral score from 1 (bad quality) to 5 (excellent quality) to the super-resolved images.

**PSNR, SSIM, MOS of SRResNet and SRGAN**

As shown above, SRResNet obtain higher PSNR and SSIM.
But SRGAN can obtain much higher MOS due to more photo-realistic super-resolved images.

SRResNet obtains new SOTA results using PSNR and SSIM, which outperforms CNN approaches such as SRCNN,DRCN and ESPCN.
SRGAN obtains new SOTA results using MOS, which also outperforms other CNN approaches.

4.2. Qualitative Results

SRGAN-MSE: Adversarial network using MSE as content loss.
SRGAN-VGG22: VGG loss using lower-level features.
SRGAN-VGG54: VGG loss using higher-level features, which has a more photo-realistic result.

During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 20th story in this month. Thanks for visiting my story…

Reference

[2017 CVPR] [SRGAN & SRResNet]
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Super Resolution

[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet] [SRGAN & SRResNet] [SR+STN]

Generative Adversarial Network

[GAN] [CGAN] [LAPGAN] [DCGAN] [SRGAN & SRResNet]