Review — SE-WRN: Squeeze-and-Excitation Wide Residual Networks in Image Classification

Squeeze-and-Excitation (SE) Attention Applied onto Wide Residual Networks (WRN)

Squeeze-and-Excitation Wide Residual Networks in Image Classification
SE-WRN, by Wuhan University of Technology, Hubei Province Key Laboratory of Transportation Internet of Things, and Wuhan University
2019 ICIP (Sik-Ho Tsang @ Medium)

  • SE block in SENet is applied onto Wide Residual Networks (WRN), where Global covariance pooling (GVP) is used, and a residual Squeeze-and-Excitation block (rSE-block).


  1. SE-WRN
  2. Experimental Results


SE-WRN: Network Architecture
  • Let B(M) denote residual block structure, where M is a list with the kernel sizes of the convolutional layers in a block.
  • B(3,3) denotes a residual block with two 3×3 layers.
  • WRN-n-k denotes a residual network that has a total number of convolutional layers n and a widening factor k (for example, network with 26 layers and k=10 times wider than origin would be denoted as WRN-26–10).
rSE-ResNet-Block has one more residual connection
  • rSE-ResNet-Block is used instead of the conventional SE-ResNet-Block in SENet.
  • The global covariance pooling replaces the common first-order, max/average pooling after the last conv layer, producing a global, d(d + 1)/2 dimensional image representation by concatenation of the upper triangular part of one covariance matrix.
  • A Dropout SE-block is used. The Dropout layer is added before the last FC layer in SE-block.
  • Following [18], a 1×1 conv layer of 256 channels is added after the last conv layer when k>4, so that the dimension of features inputted to the global covariance pooling layer is fixed to 256, which will reduce the number of parameters.

2. Experimental Results

Results on WRN-20–1 (Error%)
  • The accuracy of the WRN-20–1 on CIFAR10 bellows 93%, which results with too little channel information.

As the channels are effectively utilized, the performance of the classification gradually increases.

Results on WRN-20–4, WRN-20–8 (Error%)
  • Through the experiments, the SE-D-ResNet-GVP have a good performance. (D: Dropout)
Results on WRN-26–10 (Error%)
  • WRN-26–10 is taken as the basic network.
  • The proposed networks impose only a slight increase in model complexity and computational burden.
Error % against Epochs on CIFAR datasets

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Review: Pre-Activation ResNet with Identity Mapping — Over 1000 Layers Reached (Image…

Why the line move closer to the misclassified point in perceptron learning algorithm

How do machines learn?

Convolutional Neural Network for March Madness

Loss Change Allocation: A Microscope into Model Training

Predict Annual Rainfall with Azure ML studio

Review — DSepConv: Video Frame Interpolation via Deformable Separable Convolution (Video Frame…

Feature Scaling in Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sik-Ho Tsang

Sik-Ho Tsang

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List:

More from Medium

Review — Image Style Transfer Using Convolutional Neural Networks

Understanding Gradient Descent in PyTorch

Using pre-trained Vision Transformer model and ResNet model as features extractors for image…

Automatic Liver Segmentation — Part 4/4: Train and Test the Model