Review: SR+STN — Super-Resolution Based on Geometric Similarity
Using Spatial Transformer Network (STN) for Super-Resolution (SR), Outperforms SRCNN & VDSR
In this story, SR+STN, by Dalian University of Technology and Dalian University, is reviewed. By finding similar patches within the same image, image is super resolved with better quality.
- First, similar patches are found by k-Nearest Neighbor (kNN).
- Then, these similar patches are well-aligned by Spatial Transformer Network (STN).
- Finally, the high-resolution (HR) image will be predicted gradually according to the complementary information provided by these aligned patches.
This is published in 2019 JSPIC (Journal on Signal Processing: Image Communication). (Sik-Ho Tsang @ Medium)
Outline
- Finding Similar Patches Using kNN
- Network Architecture
- Experimental Results
1. Finding Similar Patches Using kNN
- First, some facts are found.
- According to the experiment in Ref. [42] in the paper, when much smaller patch size (e.g., 5 × 5) is used, on the average, more than 90% of the patches in an image have 9 or more other similar patches in the same image at the original image scale.
- And more than 80% of the input patches have 9 or more similar patches at 1.254 of the input scale.
- Thus, for each patch in Low Resolution (LR) image, its k nearest patch neighbors are found in the same image. i.e. for the source patch (red) as in (a), the similar patch found on the same scale is in blue rectangle.
- To find similar patches at larger scales, the input LR image I0, is downsampled to 𝐼1⋯𝐼3, i.e. (b) to (d). The downsample ratio between the layers of the pyramid images is set to be 0.8.
- The most similar patches of 𝑃 on different scales (green) in the same image are also found.
2. Network Architecture
2.1. STN
- Since the similar patch may not be well-aligned with the source patch, STN is used to well align as shown above.
- With learned θ, a well learned affine transform can be applied to input conditioned to input values. As it is differentiable, it can allow end-to-end training.
- (STN is a very famous CNN for image classification which has learning-based affine transform to tackle the rotation, zooming problem, etc. If interested, please read my review on STN.)
2.1. Progressive SR via Deconvolution Pyramid
- As shown in the figure above, a pyramid of deconvolutional layers is used to improve the spatial resolution of the input image layer by layer.
- For example, for 4× magnification, 4-layer pyramid is used to enlarge LR image gradually. The pyramid is concatenated to the back end of the network. Therefore, the whole network include 3 input layers for patch extraction and representation, 4 STN layers for spatial transform, and 4 deconvolutional layers for enlargement. So, our model is an 11-layer deep network.
- Finally, the loss function is the standard squared Euclidean distance between the super resolved and the original high resolution image:
3. Experimental Results
3.1. Dataset
- The train-91 and urban-100 are used as the training dataset.
- For testing, 519 HR images are collected from different databases, namely, 300 facial images selected randomly from LFW database and 219 other images from some standard test image databases: the Set5, Set14 dataset and BSD200. These images comes from different categories, such as face images, natural images, indoor and outdoor scenes, to ensure the algorithm is fully tested.
- For both training and testing, we only applied the proposed method on the Y channel, which is extracted from the YCbCr color space, whereas the CbCr channels are up-scaled using the bicubic interpolation.
3.2. PSNR and SSIM
- PSNR and SSIM comparison are shown as above where the proposed approach outperforms SCN (Sparse Coding-based Network), SRCNN, and VDSR.
3.3. Computational Time
- Although being only slightly better than the VDSR on reconstruction performance, the proposed method is relatively fast.
- Since STN is used in parallel, compared with the VDSR which stacks all layers in series, parallel structure of the proposed model can be calculated quickly by GPU in feed-forward propagation.
3.4. Visualization
Repetitive similar regions in an image are used to supply the high frequency detail information required by the reconstructed patch an obtain good image quality.
Reference
[2019 JSPIC] [SR+STN]
A deep learning method for image super-resolution based on geometric similarity
My Previous Reviews
Image Classification [LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [ResNet-38] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]
Object Detection [OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]
Semantic Segmentation [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3] [ResNet-38] [ResNet-DUC-HDC] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN]
Biomedical Image Segmentation [CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [QSA+QNT] [3D U-Net+ResNet] [Cascaded 3D U-Net] [Attention U-Net] [RU-Net & R2U-Net]
Instance Segmentation [SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]
Super Resolution [SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet] [SR+STN]
Human Pose Estimation [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM]
Codec Post-Processing [ARCNN] [Lin DCC’16] [IFCNN] [Li ICME’17] [VRCNN] [DCAD] [DS-CNN]
Generative Adversarial Network [GAN]