Review — DLA: Deep Layer Aggregation

DLA Outperforms ResNet, ResNeXt With Smaller Model Size

Sik-Ho Tsang
5 min readDec 29, 2021
Deep layer aggregation unifies semantic and spatial fusion to better capture what and where

Deep Layer Aggregation
DLA, by UC Berkeley
2018 CVPR, Over 600 Citations (Sik-Ho Tsang @ Medium)
Image Classification, Semantic Segmentation

  • Compounding and aggregating representations improves inference of what and where.
  • Deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters.


  1. Conventional Aggregation
  2. Proposed Aggregation Types
  3. DLA: Network Architecture
  4. Experimental Results

1. Conventional Aggregation

Conventional Approaches
  • Aggregation (Green) is defined as the combination of different layers.
  • Layers are grouped into blocks, which are then grouped into stages by their feature resolution. Authors are concerned with aggregating the blocks and stages (Black).

1.1. (a) No Aggregation

  • In convention, blocks are stacked to form a network.

1.2. (b) Shallow Aggregation

  • Skip connections, which are commonly used for tasks like segmentation and detection, are used for aggregation, but does so only shallowly by merging earlier parts in a single step each.

2. Proposed Aggregation Types

  • Only (c) Iterative Deep Aggregation (IDA) and (f) Hierarchical Deep Aggregation (HDA) are the ones mentioning in the experimental results.

2.1. Iterative Deep Aggregation (IDA)

Iterative Deep Aggregation
  • (c) Iterative Deep Aggregation: aggregates iteratively by reordering the skip connections of (b) such that the shallowest parts are aggregated the most for further processing.
  • The iterative deep aggregation function I for a series of layers x1, …, xn with increasingly deeper and semantic information is formulated as:
  • where N is the aggregation node.

2.2. Tree-Structured Aggregation

Tree-Structured Aggregation
  • (d) Tree-Structured Aggregation: aggregates hierarchically through a tree structure of blocks to better span the feature hierarchy of the network across different depths.

2.3. Reentrant Aggregation, and Hierarchical Deep Aggregation (HDA)

(e) Reentrant Aggregation, and (f) Hierarchical Deep Aggregation
  • (e) Reentrant Aggregation and (f) Hierarchical Deep Aggregation (HDA): are refinements of (d) that deepen aggregation by routing intermediate aggregations back into the network and improve efficiency by merging successive aggregations at the same depth.
  • (e): propagates the aggregation of all previous blocks instead of the preceding block alone to better preserve features.
  • (f): For efficiency, aggregation nodes are merged of the same depth (combining the parent and left child).
  • The HDA function Tn, with depth n, is formulated as:
  • where N is the aggregation node.
  • R and L are defined as:
  • where B represents a convolutional block.

3. DLA: Network Architecture

DLA: Network Architecture
  • Deep layer aggregation (DLA) learns to better extract the full spectrum of semantic and spatial information from a network.
  • Iterative connections join neighboring stages to progressively deepen and spatially refine the representation.
  • Hierarchical connections cross stages with trees that span the spectrum of layers to better propagate features and gradients.
  • In base aggregation, aggregation mode N is:
  • where σ is the activation function.
  • If residual connections are added:

DLA makes no requirements of the internal structure of the blocks and stages. DLA connects across stages with IDA, and within and across stages by HDA.

  • For classification, ResNet and ResNeXt are augmented with IDA and HDA.
Interpolation by iterative deep aggregation
  • For segmentation, the conversion from classification DLA to fully convolutional DLA is simple and no different than for other architectures.
  • IDA for interpolation increases both depth and resolution by projection and upsampling.
  • Stages are fused from shallow to deep to make a progressively deeper and higher resolution decoder.

4. Experimental Results

4.1. Classification

Deep layer aggregation networks for classification Stages 1 and 2 show the number of channels n while further stages show d-n where d is the aggregation depth. Models marked with “-C” are compact and only have 1 million parameters.
  • Different DLA models are built as above.
Evaluation of DLA on ILSVRC
  • DLA-34 and ResNet-34 both use basic blocks, but DLA-34 has about 30% fewer parameters and ~1 point of improvement in top-1 error rate.
  • DLA-X-102 has nearly the half number of parameters compared to ResNeXt-101, but the error rate difference is only 0.2%.
  • Compared with DenseNet, DLA achieves higher accuracy with lower memory usage because the aggregation node fan-in size is log of the total number of convolutional blocks in HDA.
Comparison with compact models
  • Compare to SqueezeNet, which shares a block design similar to DLA, DLA is more accurate with the same number of parameters.

3.2. Fine-grained Recognition

Statistics for fine-grained recognition datasets
Comparison with state-of-the-art methods on fine-grained datasets
  • DLAs improve or rival the state-of-the-art without further annotations or specific modules for fine-grained recognition.
  • DLAs are competitive with VGGNet and ResNet while having only several million parameters, however, not better than state-of-the-art on Birds, although note that this dataset has fewer instances per class so further regularization might help.

3.3. Semantic Segmentation

Evaluation on Cityscapes
  • Surprisingly, DLA-34 performs very well on Cityscapes and it is as accurate as DLA-102.
  • Test evaluation in the same multi-scale fashion as RefineNet with image scales of [0.5, 0.75, 1, 1.25, 1.5] and sum the predictions. DLA improves RefineNet by 2+ points, outperforms FCN by a large margin.
Evaluation on CamVid
  • Iterative Deep Aggregation (IDA) is later on used in SqueezeNext.


[2018 CVPR] [DLA]
Deep Layer Aggregation

Image Classification

1989–2018: … [DLA]
2019: [ResNet-38] [AmoebaNet] [ESPNetv2] [MnasNet] [Single-Path NAS] [DARTS] [ProxylessNAS] [MobileNetV3] [FBNet] [ShakeDrop] [CutMix] [MixConv] [EfficientNet] [ABN] [SKNet] [CB Loss] [AutoAugment, AA] [BagNet]
2020: [Random Erasing (RE)] [SAOL] [AdderNet]
2021: [Learned Resizer]

Semantic Segmentation

2015–2018: … [DLA]
2019: [ResNet-38] [C3] [ESPNetv2] [ADE20K]
2020: [DRRN Zhang JNCA’20]

My Other Previous Paper Readings



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.