Review: SRGAN & SRResNet — Photo-Realistic Super Resolution (GAN & Super Resolution)
Using Generative Adversarial Network (GAN), Lower PSNR But with More Photo-Realistic Super-Resolved Image Can Be Obtained
In this paper, a generative adversarial network (GAN) for image super-resolution (SR), SRGAN, by Twitter, is reviewed. The network wihout using GAN is SRResNet. Super-resolved images obtain high peak signal-to-noise ratios (PSNRs), but they are often lacking high-frequency details and are perceptually unsatisfying. One of the reason is that they are using MSE as loss function.
- SRGAN is the first framework capable of inferring photo-realistic natural images for 4× upscaling factors.
- A perceptual loss function which consists of an adversarial loss and a content loss is proposed for SR, which uses the high-level feature maps of VGG network, more invariant to changes in pixel space.
- SRResNet also obtained new state-of-the-art results in terms of PSNR and SSIM at that moment.
- An extensive mean opinion score (MOS) test on images from three public benchmark datasets also shows SRGAN was new state-of-the-art approach by large margin at that moment.
This is a paper in 2017 CVPR with over 3300 citations. (Sik-Ho Tsang @ Medium)
Outline
- SR Formulation
- Adversarial Network Architecture
- Perceptual Loss Function
- Experimental Results
1. SR Formulation
- In Single Image Super Resolution (SISR) the aim is to estimate a high-resolution, super-resolved image I^SR from a low-resolution input image ILR.
- Here ILR is the low-resolution version of its high-resolution counterpart IHR. The high-resolution images are only available during training.
- In training, ILR is obtained by applying a Gaussian filter to IHR followed by a downsampling operation with downsampling factor r.
- For an image with C color channels, ILR is described by a real-valued tensor of size W×H×C and IHR, ISR by rW×rH×C respectively.
2. Adversarial Network Architecture
- The general idea behind this formulation is that it allows one to train a generative model G with the goal of fooling a differentiable discriminator D that is trained to distinguish super-resolved images from real images.
- With this approach, the generator can learn to create solutions that are highly similar to real images and thus difficult to classified by D.
- The min-max problem in order to train D and G:
- (For more information, please read GAN if interested.)
2.1. Generator Network G
- There are B residual blocks (B=16), originated by ResNet.
- Within the residual block, two convolutional layers are used, with small 3×3 kernels and 64 feature maps followed by batch-normalization layers and ParametricReLU as the activation function.
- The resolution of the input image is increased with two trained sub-pixel convolution layers.
2.2. Discriminator Network D
- LeakyReLU activation (α=0.2) and avoid max-pooling throughout the network.
- The discriminator network is trained to solve the maximization problem.
- The network contains eight convolutional layers with an increasing number of 3×3 filter kernels, increasing by a factor of 2 from 64 to 512 kernels as in the VGG network.
- Strided convolutions are used to reduce the image resolution each time the number of features is doubled.
- The resulting 512 feature maps are followed by two dense layers and a final sigmoid activation function to obtain a probability for sample classification.
3. Perceptual Loss Function
- The perceptual loss as the weighted sum of a content loss (lSRX) and an adversarial loss component:
3.1. Content Loss
- Instead of using MSE loss:
- Instead of relying on pixel-wise losses, SRGAN uses the VGG loss:
- Φi,j indicates the feature map obtained by the j-th convolution (after activation) before the i-th maxpooling layer within the VGG19 network.
- Wi,j and Hi,j describe the dimensions of the respective feature maps.
- This VGG loss is the euclidean distance between the feature representations of a reconstructed image GθG(ILR) and the reference image IHR.
The content loss is motivated by perceptual similarity instead of similarity in pixel space.
3.2. Adversarial Loss
- The generative loss lSRGen is defined based on the probabilities of the discriminator DθD(GθG(ILR)) over all training samples:
- DθD(GθG(ILR)) is the probability that the reconstructed image GG(ILR) is a natural HR image.
The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
4. Experimental Results
- Three datasets are tested: Set5, Set14 and BSD100, with a scale factor of 4×, i.e. 16× reduction in image pixels.
- All networks are trained on a NVIDIA Tesla M40 GPU using a random sample of 350 thousand images from the ImageNet database.
- MSE-based SRResNet network is trained as initialization for the generator when training the actual GAN to avoid undesired local optima.
- The generator and discriminator networks are alternate updated, which is equivalent to k = 1 as in GAN.
- During test time batch-normalization update is off to obtain an output that deterministically depends only on the input.
4.1. Mean Opinion Score (MOS) Testing
- 26 raters are asked to assign an integral score from 1 (bad quality) to 5 (excellent quality) to the super-resolved images.
- As shown above, SRResNet obtain higher PSNR and SSIM.
- But SRGAN can obtain much higher MOS due to more photo-realistic super-resolved images.
- SRResNet obtains new SOTA results using PSNR and SSIM, which outperforms CNN approaches such as SRCNN,DRCN and ESPCN.
- SRGAN obtains new SOTA results using MOS, which also outperforms other CNN approaches.
4.2. Qualitative Results
During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 20th story in this month. Thanks for visiting my story…
Reference
[2017 CVPR] [SRGAN & SRResNet]
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
Super Resolution
[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet] [SRGAN & SRResNet] [SR+STN]
Generative Adversarial Network
[GAN] [CGAN] [LAPGAN] [DCGAN] [SRGAN & SRResNet]