Review — GhostNet: More Features from Cheap Operations
GhostNet is Formed, By Stacking Ghost Modules
GhostNet: More Features from Cheap Operations,
GhostNet, by Huawei Technologies, Peking University, and University of Sydney,
2020 CVPR, Over 1100 Citations (Sik-Ho Tsang @ Medium)
Image ClassificationImage Classification
1989 … 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2] [CSWin Transformer] [Pale Transformer] [Sparse MLP] [MViTv2] [S²-MLP] [CycleMLP] [MobileOne] [GC ViT] [VAN] [ACMix] [CVNets] [MobileViT] [RepMLP] [RepLKNet] [ParNet] 2023 [Vision Permutator (ViP)]
==== My Other Paper Readings Are Also Over Here ====
- A novel Ghost module is proposed to generate more feature maps from cheap operations.
- By stacking Ghost modules, GhostNet is formed.
- Later on, GhostNetV2 is also proposed in 2022 NeurIPS.
Outline
- GhostNet
- Results
1. GhostNet
1.1. Ghost Module
- (a) Standard convolution.
- (b) Ghost Module. And the below source code is the ghost module implementation.
class GhostModule(nn.Module):
def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True):
super(GhostModule, self).__init__()
self.oup = oup
init_channels = math.ceil(oup / ratio)
new_channels = init_channels*(ratio-1)
self.primary_conv = nn.Sequential(
nn.Conv2d(inp, init_channels, kernel_size, stride, kernel_size//2, bias=False),
nn.BatchNorm2d(init_channels),
nn.ReLU(inplace=True) if relu else nn.Sequential(),
)
self.cheap_operation = nn.Sequential(
nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size//2, groups=init_channels, bias=False),
nn.BatchNorm2d(new_channels),
nn.ReLU(inplace=True) if relu else nn.Sequential(),
)
def forward(self, x):
x1 = self.primary_conv(x)
x2 = self.cheap_operation(x1)
out = torch.cat([x1,x2], dim=1)
return out[:,:self.oup,:,:]
- As seen from source code, standard convolution is performed for part of the feature maps, and x1 is outputted.
- d×d cheap operation, i.e. depth-wise convolution, is performed on x1, and x2 is outputted.
- Then, x1 and x2 are concatenated.
- s is the ratio as above for generating m=n/s intrinsic feature maps.
1.2. Ghost Bottleneck
- The Ghost bottleneck appears to be similar to the basic residual block in ResNet. The proposed ghost bottleneck mainly consists of two stacked Ghost modules.
- The first Ghost module acts as an expansion layer increasing the number of channels. The ratio is the expansion ratio.
- The second Ghost module reduces the number of channels to match the shortcut path.
- When stride=2, the shortcut path is implemented by a downsampling layer and a depthwise convolution with stride=2 is inserted between the two Ghost modules.
1.3. GhostNet
- The model basically follows the architecture of MobileNetV3, replacing the bottleneck block in MobileNetV3 with our Ghost bottleneck.
- All the Ghost bottlenecks are applied with stride=1 except that the last one in each stage is with stride=2.
- The squeeze and excite (SE) module, as in SENet, is also applied to the residual layer in some ghost bottlenecks.
- Yet, hard-Swish nonlinearity function as in MobileNetV3 is NOT used.
- A width multiplier α is applied to scale the number of channels uniformly at each layer. GhostNet with width multiplier α as GhostNet-α×.
2. Results
2.1. Ablation Study
- s is used for generating m=n/s intrinsic feature maps, and kernel size d×d of linear operations (i.e. the size of depthwise convolution filters) for calculating ghost feature maps.
- s=2 and d is tuned in {1, 3, 5, 7}.
d=3 is the best.
- d=3 and s is tuned in the range of {2, 3, 4, 5}.
- Larger s leads to larger compression and speed-up ratio.
When s=2 which means compress VGG-16 by 2×, GhostNet performs even slightly better than the original model.
2.2. CIFAR-10
Using ghost modules obtain small model size while keeping the accuracy.
2.3. ImageNet
- ResNet-50 has about 25.6M parameters and 4.1B FLOPs with a top-5 error of 7.8%.
Ghost-ResNet-50 (s=2) obtains about 2× acceleration and compression ratio, while maintaining the accuracy as that of the original ResNet-50.
- GhostNet obtain about 0.5% higher top-1 accuracy than MobileNetV3 with the same latency, and GhostNet need less runtime to achieve similar performance.
- For example, GhostNet with 75.0% accuracy only has 40 ms latency, while MobileNetV3 with similar accuracy requires about 45 ms to process one image.
Overall, GhostNets generally outperform the famous state-of-art models, i.e. MobileNetV2, MobileNetV3, ProxylessNAS, FBNet, and MnasNet.
2.4. MS COCO
- Both the two-stage Faster R-CNN with Feature Pyramid Networks (FPN) and the one-stage RetinaNet frameworks are used.
With significantly lower computational costs, GhostNet achieves similar mAP with MobileNetV2 and MobileNetV3.