# Brief Review — ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

## ECA-Net, Outperforms SENet With Lower Complexity

--

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks,ECA-Net, by Tianjin University, Dalian University of Technology, and Harbin Institute of Technology2020 CVPR, Over 1900 Citations(Sik-Ho Tsang @ Medium)

Image Classification1989 … 2022[ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2] [CSWin Transformer] [Pale Transformer] [Sparse MLP] [MViTv2] [S²-MLP] [CycleMLP] [MobileOne] [GC ViT] [VAN] [ACMix] [CVNets] [MobileViT] [RepMLP] [RepLKNet] [ParNet]2023[Vision Permutator (ViP)]

==== My Other Paper Readings Are Also Over Here ====

**ECA-Net**is proposed, where**a local cross-channel interaction strategy without dimensionality reduction**is used, which can be efficiently implemented via**1D convolution.**- A method is developed to
**adaptively select kernel size of 1D convolutio**n, determining coverage of local cross-channel interaction.

# Outline

**SENet****Variants****ECA-Net****Results**

**1. **SENet **Variants**

## 1.1. SENet

- Channel attention of SE block in SENet is:

- where
is*σ***sigmoid**.*W*is learnable weights/layers, andwhich is*y*=*g*(*X*)**global average pooling (GAP)**:

In SENet,

2 FC layers with dimension reduction and expansionare used:

## 1.2. SENet Variants

**SE-Var1 with no parameter**is tried, and is**still superior**to the original network.**SE-Var2**learns the weight of each channel independently, which is**slightly superior to SE block**while involving**less parameters**.

SE-Var3employingone single FC layerperformsbetter than two FC layers with dimensionality reduction in SE block.

for SE-Var2 is aWvar2diagonal matrix, involvingCparameters;for SE-Var3 is aWvar3full matrix, involvingC² parameters.

## 1.3. Group Convolution (GC)

- One extension of
*Wvar2*is a**block diagonal matrix**:

- which is implemented using
**SE block with group convolutions (SE-GC)**and it is indicated by.*σ*(GCG(y))=*σ*(WGy)

Yet,

excessive group convolutionswillincrease memory access costand sodecrease computational efficiency.SE-GC with varying groups have bringno gains, as shown in the above table. This might be due to the fact thatSE-GC completely discards dependences among different groups.

**2. ECA-Net**

## 2.1. ECA Module

- Based on the above analysis, ECA module is designed where each parameter.
**The weight of**is calculated by*yi***only considering interaction between yi and its**, i.e.:*k*neighbors

- By sharing the parameters, indeed, it is a
**1D convolution (C1D)**:

ECA module withachievesk=3similar results with SE-var3while havingmuch lower model complexity.

## 2.2. Adaptive Kernel Size

**The coverage of interaction (i.e., kernel size**. In other words, there may exist a mapping*k*of 1D convolution) is proportional to channel dimension*C**Φ*between*k*and*C*:

- The simplest way is a
**linear function***γ***k*-*b*. - A
**non-linear way**is:

- Thus, given channel dimension
*C*, kernel size*k*can be adaptively determined by:

- where
and*γ*=2*b*=1.

# 3. Results

## 3.1. ECA-Net Performance

ECA-Net clearly outperformsSENet, and alsooutperforms fixed kernel version of ECA-Net.

## 3.2. Different Backbones on ImageNet

ECA-Net is superior toSENetandCBAMwhile it isvery competitive to AA-Net with lower model complexity. Note that AA-Net is trained with Inception data augmentation and different setting of learning rates.

## 3.3. SOTA Comparisons on ImageNet

ECA-Net performs favorably against state-of-the-art CNNswhile benefitingmuch lower model complexity.

## 3.4. Object Detection on MS COCO

Different frameworks are used, ECA-Net can

well generalize to object detection task.

## 3.5. Instance Segmentation on MS COCO

For ResNet-50 as backbone,

ECA with lower model complexityissuperior over oneNL, and iscomparable to GC block[2].