# AdwU-Net: Adaptive Depth and Width U-Net for Medical Image Segmentation by Differentiable Neural Architecture Search

## Neural Architecture Search (NAS) for Depth & Width in U-Net

--

AdwU-Net: Adaptive Depth and Width U-Net for Medical Image Segmentation by Differentiable Neural Architecture Search,AdwU-Net, by Shanghai Jiao Tong University, and National Medical Products Administration,2022 MIDL(Sik-Ho Tsang @ Medium)

Biomedical Image Segmentation2015 …2022[UNETR] [Half-UNet] [BUSIS] [RCA-IUNet] [Swin-Unet] [DS-TransUNet] [UNeXt]2023[DCSAU-Net] [RMMLP]

==== My Other Paper Readings Are Also Over Here ====

**AdwU-Net**is proposed, which is an**efficient neural architecture search (NAS)**framework to**search the optimal task-specific depth and width**in the U-Net backbone.**In each block, the optimal number of convolutional layers and channels in each layer are directly learned from data.**To reduce the computational costs and alleviate the memory pressure, an efficient architecture search is used and the network weights are reused.

# Outline

**AdwU-Net****Results**

**1. AdwU-Net**

## 1.1. Overall Idea

**Each AdwBlock**consists of**three adaptive width blocks (AwBlock)**.- The AdwBlock is designed for the
**search of the optimal number of AwBlocks**, which is**also the optimal depth**in each block. The AwBlock is designed for the search of**optimal channel number of convolutional layers.** - For the
**depth level**,**each block**can choose the**number of convolutional layers between 1 to 3**. - For the
**width level**, each convolutional layer can have**5 filter number options**. Therefore, each AdwBlock has**5+5²+5³=155 different candidate architectures.** - Consider the U-Net backbone with
**11 blocks**, then the AdwU-Net has**155¹¹≈10²⁴ candidate architectures**which are impossible to explore manually.

## 1.2. Depth Search (Figure 1(a))

- In the search procedure, the
**output of each resolution stage**is the**weighted sum of the outputs of different depth options.** - The naive implementation is to construct three independent parallel paths, GPU memory will be increased quadratically.
- To avoid redundancy,
**only the deepest path is kept**and**weighted skip connections are added from the output of preceding layers**to the sink point at the end of each block as shown in**Figure 1(a).** - The deeper path reuses the convolution weights of the shallower path.
- Let
be the*αsl***architecture parameter of**.*l*th depth option in stage*s***Gumbel Softmax**is employed:

- where
*εsl*∈ Gumbel(0, 1)**random noise**following the Gumbel distribution andis a*τ***temperature**parameter. - Given the input
*xs*, the output of stage*s*is the weighted sum of the output of each convolutional layer:

- where
is the*osl***output of***l*th layer in stage*s*.

By doing so, the

computational budgetof the whole block during the search isroughly the same as computing the feature maps of the deepest path only once.

## 1.3. Width Search (Figure 1(b))

- Inspired from FBNetV2 (Wan et al., 2020), convolutions with varying channel numbers are represented by convolutions with equal channel numbers multiplied by different channel masks. Then,
**the weights of different convolutions are shared**to reduce computational costs and GPU memory consumption. - The output of preceding layer
*osl*-1 as the input of*l*th layer in stage*s*.

The convolution operation once is only run once, then multiplied by the weighted summation of masks.Instance normalization (IN) and Leaky ReLU are applied after the convolutional layer.

- where
is a*Msl*,*i***column vector**which has**ones in the leading**and*i*entries**zeros at the end**.*gsl*,*i*is the Gumbel weight parameter of the*i*th mask in the*l*th layer and*σ*denotes Leaky ReLU.

- There are
**5 candidate channel numbers**in each convolutional layer. The channel number at the**first stage**ranges**from 16 to 48 with step 8**. - When the resolution is reduced to half, all of the 5 candidate channel numbers double. The channel con gurations in stages greater than 4 still follow the configuration of stage 4.

## 1.4. Optimization

In each iteration, the network

weightandwis fixed firstupdate architecture parameters α andusing trainA and trainB in succession.βThen the architecture parameters

α andand theβare fixednetwork weightusing trainA.wis updated

- The
**sum of dice loss and cross-entropy loss**are used as the loss function.

After searching, the

optimal depth and widthfor each stage areobtained by argmaxoperation.

- The search procedure takes
**2 days**on**1 NVIDIA V100 GPU**with**32GB memory.**After searching, the network is**retrained**with the searched depth and width of**1000 epochs**for validation.

# 2. Results

## 2.1. MSD Test Set

AdwU-Net achieves

best performance in 6 of 10 tasksincluding Heart, Liver, Hippocampus, Lung, Pancreas, and Hepatic Vessel.Overall, AdwU-Net achieves the

best average Dice of 0.7803 in all methodswithout pre-trainingin the MSD leaderboard.

## 2.2. Ablation Studies

Using

both depth search and width searchobtains thebestresults.

With less computational costs, the searched models outperform the scaled models, which shows the effectiveness and effciency of the proposed methods.