Reading: SelNet — CNN with Selection Units, Top-5 Ranked in NTIRE2017 Challenge (Super Resolution)

Outperforms A+, SRCNN, and VDSR with low computational complexity

4 min readJul 10, 2020

In this paper, A Deep Convolutional Neural Network with Selection Units for Super-Resolution (SelNet), by Korea Advanced Institute of Science and Technology (KAIST), is presented. In this paper:

Selection Unit (SU) is proposed to optimize this on-off switching control, and is therefore capable of better handling nonlinearity functionality than ReLU in a more flexible way.
A deep network with SUs, called SelNet, is formed, was top-5th ranked in NTIRE2017 Challenge, which has a much lower computation complexity compared to the top-4 entries.

This is a paper in 2017 CVPRW with about 30 citations. (Sik-Ho Tsang @ Medium)

Outline

Reinterpreting ReLU
Selection Unit (SU) & SelNet Network Architecture
Experimental Results

1. Reinterpreting ReLU

ReLU is originally interpreted as an identity mapping multiplied by an on-off switch as above.

Authors reinterpreted ReLU as an an identity mapping multiplied by a sigmoid function.
And this sigmoid function is based on the feature maps after convolutions, which means that the on-off switch is adaptively on or off according to the input feature maps.

2. Selection Unit (SU) & SelNet Network Architecture

2.1. Selection Unit (SU)

As shown above, the selection unit (SU), which now has control over which values in the feature maps from the previous convolutional layer can be input to the next layer.
Selection Module (SM) is formed as a cascade connection of one ReLU, a 1×1 convolution and a sigmoid in a row.

Thus, the proposed SU is a multiplication of two modules: an identity mapping and an SM.

SM is able to optimize whole selection control as training error can be back-propagated through itself, which will update the 1×1 convolutional filter to optimize which data is to be passed to the next layer.

2.2. SelNet: Network Architecture

As shown above, a 22-layered deep network for SR (SelNet).
Residual units using identity mappings, originated from Pre-Activation ResNet, is used, where the (n-2)-th feature map after convolution is simply added to the n-th feature map and forwarded to the (n+1)-th layer.
A technique for learning the residual between HR and a bicubic-interpolated image is used, as originated in VDSR.
A subpixel layer, originated in ESPCN, is added to the end of the network to convert a multi-channeled LR-sized image into an HR-sized output.

As shown above, with SU, average PSNR is higher.

2. Experimental Results

2.1. Dataset

800 high-quality images from the NTIRE2017 Challenge training dataset are used for training.
These training images are divided into 120×120-sized RGB subimages without overlapping.
LR training subimages are obtained by bicubic interpolation from HR images.
No data augmentation. 162,946 LR-HR subimage pairs are formed.
Batch size is 32. Number of epoch is 50.