Reading: SelNet — CNN with Selection Units, Top-5 Ranked in NTIRE2017 Challenge (Super Resolution)

Outperforms A+, SRCNN, and VDSR with low computational complexity

Sik-Ho Tsang
4 min readJul 10, 2020

In this paper, A Deep Convolutional Neural Network with Selection Units for Super-Resolution (SelNet), by Korea Advanced Institute of Science and Technology (KAIST), is presented. In this paper:

  • Selection Unit (SU) is proposed to optimize this on-off switching control, and is therefore capable of better handling nonlinearity functionality than ReLU in a more flexible way.
  • A deep network with SUs, called SelNet, is formed, was top-5th ranked in NTIRE2017 Challenge, which has a much lower computation complexity compared to the top-4 entries.

This is a paper in 2017 CVPRW with about 30 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Reinterpreting ReLU
  2. Selection Unit (SU) & SelNet Network Architecture
  3. Experimental Results

1. Reinterpreting ReLU

Reinterpreting ReLU
  • ReLU is originally interpreted as an identity mapping multiplied by an on-off switch as above.

Authors reinterpreted ReLU as an an identity mapping multiplied by a sigmoid function.

And this sigmoid function is based on the feature maps after convolutions, which means that the on-off switch is adaptively on or off according to the input feature maps.

2. Selection Unit (SU) & SelNet Network Architecture

Selection Unit (SU) & SelNet Network Architecture

2.1. Selection Unit (SU)

  • As shown above, the selection unit (SU), which now has control over which values in the feature maps from the previous convolutional layer can be input to the next layer.
  • Selection Module (SM) is formed as a cascade connection of one ReLU, a 1×1 convolution and a sigmoid in a row.

Thus, the proposed SU is a multiplication of two modules: an identity mapping and an SM.

  • SM is able to optimize whole selection control as training error can be back-propagated through itself, which will update the 1×1 convolutional filter to optimize which data is to be passed to the next layer.

2.2. SelNet: Network Architecture

  • As shown above, a 22-layered deep network for SR (SelNet).
  • Residual units using identity mappings, originated from Pre-Activation ResNet, is used, where the (n-2)-th feature map after convolution is simply added to the n-th feature map and forwarded to the (n+1)-th layer.
  • A technique for learning the residual between HR and a bicubic-interpolated image is used, as originated in VDSR.
  • A subpixel layer, originated in ESPCN, is added to the end of the network to convert a multi-channeled LR-sized image into an HR-sized output.
PSNR Against Epoch
  • As shown above, with SU, average PSNR is higher.

2. Experimental Results

2.1. Dataset

  • 800 high-quality images from the NTIRE2017 Challenge training dataset are used for training.
  • These training images are divided into 120×120-sized RGB subimages without overlapping.
  • LR training subimages are obtained by bicubic interpolation from HR images.
  • No data augmentation. 162,946 LR-HR subimage pairs are formed.
  • Batch size is 32. Number of epoch is 50.

2.2. Results on Set5, Set14, B100

PSNR, SSIM, and Time (sec) on Set5
PSNR, SSIM, and Time (sec) on Set14
PSNR, SSIM, and Time (sec) on B100
  • As shown above, SelNet outperforms A+, SRCNN, and VDSR with low computational complexity.

2.3. Visual Quality

Scale Factor of 4 on image woman
  • SelNet is able to separate hat strings, where other SR methods have difficulty.
Scale Factor of 4 on image ppt3
  • SelNet reconstructs a sharper and clearer HR image, where a pencil and a microphone string can clearly be discerned.

This is the 4th story in this month.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.