Review — DLA: Deep Layer Aggregation

DLA Outperforms ResNet, ResNeXt With Smaller Model Size

5 min readDec 29, 2021

**Deep layer aggregation unifies semantic and spatial fusion to better capture what and where**

Deep Layer Aggregation
DLA, by UC Berkeley
2018 CVPR, Over 600 Citations (Sik-Ho Tsang @ Medium)
Image Classification, Semantic Segmentation

Compounding and aggregating representations improves inference of what and where.
Deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters.

Outline

Conventional Aggregation
Proposed Aggregation Types
DLA: Network Architecture
Experimental Results

1. Conventional Aggregation

Aggregation (Green) is defined as the combination of different layers.
Layers are grouped into blocks, which are then grouped into stages by their feature resolution. Authors are concerned with aggregating the blocks and stages (Black).

1.1. (a) No Aggregation

In convention, blocks are stacked to form a network.

1.2. (b) Shallow Aggregation

Skip connections, which are commonly used for tasks like segmentation and detection, are used for aggregation, but does so only shallowly by merging earlier parts in a single step each.

2. Proposed Aggregation Types

Only (c) Iterative Deep Aggregation (IDA) and (f) Hierarchical Deep Aggregation (HDA) are the ones mentioning in the experimental results.

2.1. Iterative Deep Aggregation (IDA)

(c) Iterative Deep Aggregation: aggregates iteratively by reordering the skip connections of (b) such that the shallowest parts are aggregated the most for further processing.
The iterative deep aggregation function I for a series of layers x1, …, xn with increasingly deeper and semantic information is formulated as:

where N is the aggregation node.

2.2. Tree-Structured Aggregation

(d) Tree-Structured Aggregation: aggregates hierarchically through a tree structure of blocks to better span the feature hierarchy of the network across different depths.

2.3. Reentrant Aggregation, and Hierarchical Deep Aggregation (HDA)

**(e) Reentrant Aggregation, and (f) Hierarchical Deep Aggregation**

(e) Reentrant Aggregation and (f) Hierarchical Deep Aggregation (HDA): are refinements of (d) that deepen aggregation by routing intermediate aggregations back into the network and improve efficiency by merging successive aggregations at the same depth.
(e): propagates the aggregation of all previous blocks instead of the preceding block alone to better preserve features.
(f): For efficiency, aggregation nodes are merged of the same depth (combining the parent and left child).
The HDA function Tn, with depth n, is formulated as:

where N is the aggregation node.
R and L are defined as:

where B represents a convolutional block.

3. DLA: Network Architecture

Deep layer aggregation (DLA) learns to better extract the full spectrum of semantic and spatial information from a network.
Iterative connections join neighboring stages to progressively deepen and spatially refine the representation.
Hierarchical connections cross stages with trees that span the spectrum of layers to better propagate features and gradients.
In base aggregation, aggregation mode N is:

where σ is the activation function.
If residual connections are added:

DLA makes no requirements of the internal structure of the blocks and stages. DLA connects across stages with IDA, and within and across stages by HDA.

For classification, ResNet and ResNeXt are augmented with IDA and HDA.

**Interpolation by iterative deep aggregation**

For segmentation, the conversion from classification DLA to fully convolutional DLA is simple and no different than for other architectures.
IDA for interpolation increases both depth and resolution by projection and upsampling.
Stages are fused from shallow to deep to make a progressively deeper and higher resolution decoder.

4. Experimental Results

4.1. Classification

**Deep layer aggregation networks for classification** Stages 1 and 2 show the **number of channels n** while further stages show **d-n** where d is the **aggregation depth**. Models marked with “-C” are **compact** and only have 1 million parameters.

Different DLA models are built as above.

DLA-34 and ResNet-34 both use basic blocks, but DLA-34 has about 30% fewer parameters and ~1 point of improvement in top-1 error rate.
DLA-X-102 has nearly the half number of parameters compared to ResNeXt-101, but the error rate difference is only 0.2%.
Compared with DenseNet, DLA achieves higher accuracy with lower memory usage because the aggregation node fan-in size is log of the total number of convolutional blocks in HDA.

Compare to SqueezeNet, which shares a block design similar to DLA, DLA is more accurate with the same number of parameters.

3.2. Fine-grained Recognition

**Statistics for fine-grained recognition datasets**

**Comparison with state-of-the-art methods on fine-grained datasets**

DLAs improve or rival the state-of-the-art without further annotations or specific modules for fine-grained recognition.
DLAs are competitive with VGGNet and ResNet while having only several million parameters, however, not better than state-of-the-art on Birds, although note that this dataset has fewer instances per class so further regularization might help.

3.3. Semantic Segmentation

Surprisingly, DLA-34 performs very well on Cityscapes and it is as accurate as DLA-102.
Test evaluation in the same multi-scale fashion as RefineNet with image scales of [0.5, 0.75, 1, 1.25, 1.5] and sum the predictions. DLA improves RefineNet by 2+ points, outperforms FCN by a large margin.

Higher depth and resolution help. DLA is state-of-the-art, outperforms SegNet, DeepLabv1, DilatedNet and FSO.

Iterative Deep Aggregation (IDA) is later on used in SqueezeNext.

Reference

[2018 CVPR] [DLA]
Deep Layer Aggregation

Image Classification

1989–2018: … [DLA]
2019: [ResNet-38] [AmoebaNet] [ESPNetv2] [MnasNet] [Single-Path NAS] [DARTS] [ProxylessNAS] [MobileNetV3] [FBNet] [ShakeDrop] [CutMix] [MixConv] [EfficientNet] [ABN] [SKNet] [CB Loss] [AutoAugment, AA] [BagNet]
2020: [Random Erasing (RE)] [SAOL] [AdderNet]
2021: [Learned Resizer]

Semantic Segmentation

2015–2018: … [DLA]
2019: [ResNet-38] [C3] [ESPNetv2] [ADE20K]
2020: [DRRN Zhang JNCA’20]

Review — DLA: Deep Layer Aggregation

DLA Outperforms ResNet, ResNeXt With Smaller Model Size

Outline

1. Conventional Aggregation

1.1. (a) No Aggregation

1.2. (b) Shallow Aggregation

2. Proposed Aggregation Types

2.1. Iterative Deep Aggregation (IDA)

2.2. Tree-Structured Aggregation

2.3. Reentrant Aggregation, and Hierarchical Deep Aggregation (HDA)

3. DLA: Network Architecture

4. Experimental Results

4.1. Classification

3.2. Fine-grained Recognition

3.3. Semantic Segmentation

Reference

Image Classification

Semantic Segmentation

My Other Previous Paper Readings

Written by Sik-Ho Tsang

No responses yet