Reading: SelNet — CNN with Selection Units, Top-5 Ranked in NTIRE2017 Challenge (Super Resolution)
In this paper, A Deep Convolutional Neural Network with Selection Units for Super-Resolution (SelNet), by Korea Advanced Institute of Science and Technology (KAIST), is presented. In this paper:
- Selection Unit (SU) is proposed to optimize this on-off switching control, and is therefore capable of better handling nonlinearity functionality than ReLU in a more flexible way.
- A deep network with SUs, called SelNet, is formed, was top-5th ranked in NTIRE2017 Challenge, which has a much lower computation complexity compared to the top-4 entries.
This is a paper in 2017 CVPRW with about 30 citations. (Sik-Ho Tsang @ Medium)
Outline
- Reinterpreting ReLU
- Selection Unit (SU) & SelNet Network Architecture
- Experimental Results
1. Reinterpreting ReLU
- ReLU is originally interpreted as an identity mapping multiplied by an on-off switch as above.
Authors reinterpreted ReLU as an an identity mapping multiplied by a sigmoid function.
And this sigmoid function is based on the feature maps after convolutions, which means that the on-off switch is adaptively on or off according to the input feature maps.
2. Selection Unit (SU) & SelNet Network Architecture
2.1. Selection Unit (SU)
- As shown above, the selection unit (SU), which now has control over which values in the feature maps from the previous convolutional layer can be input to the next layer.
- Selection Module (SM) is formed as a cascade connection of one ReLU, a 1×1 convolution and a sigmoid in a row.
Thus, the proposed SU is a multiplication of two modules: an identity mapping and an SM.
- SM is able to optimize whole selection control as training error can be back-propagated through itself, which will update the 1×1 convolutional filter to optimize which data is to be passed to the next layer.
2.2. SelNet: Network Architecture
- As shown above, a 22-layered deep network for SR (SelNet).
- Residual units using identity mappings, originated from Pre-Activation ResNet, is used, where the (n-2)-th feature map after convolution is simply added to the n-th feature map and forwarded to the (n+1)-th layer.
- A technique for learning the residual between HR and a bicubic-interpolated image is used, as originated in VDSR.
- A subpixel layer, originated in ESPCN, is added to the end of the network to convert a multi-channeled LR-sized image into an HR-sized output.
- As shown above, with SU, average PSNR is higher.
2. Experimental Results
2.1. Dataset
- 800 high-quality images from the NTIRE2017 Challenge training dataset are used for training.
- These training images are divided into 120×120-sized RGB subimages without overlapping.
- LR training subimages are obtained by bicubic interpolation from HR images.
- No data augmentation. 162,946 LR-HR subimage pairs are formed.
- Batch size is 32. Number of epoch is 50.
2.2. Results on Set5, Set14, B100
2.3. Visual Quality
- SelNet is able to separate hat strings, where other SR methods have difficulty.
- SelNet reconstructs a sharper and clearer HR image, where a pencil and a microphone string can clearly be discerned.
This is the 4th story in this month.
Reference
[2017 CVPRW] [SelNet]
A Deep Convolutional Neural Network with Selection Units for Super-Resolution
Super Resolution
[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DnCNN] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [MemNet] [IRCNN] [WDRN / WavResNet] [MWCNN] [SRDenseNet] [SRGAN & SRResNet] [SelNet] [EDSR & MDSR] [MDesNet] [RDN] [SRMD & SRMDNF] [DBPN & D-DBPN] [RCAN] [ESRGAN] [SR+STN]