Review — DALS: Deep Active Lesion Segmentation
DALS, Using Fully Convolutional Network (FCN) and Active Contour Model (ACM)
Deep Active Lesion Segmentation,
DALS, by University of California, and Stanford University,
2019 MLMI, Over 40 Citations (Sik-Ho Tsang @ Medium)
Medical Imaging, Medical Image Analysis, Image Segmentation
- Deep Active Lesion Segmentation (DALS), a fully automated segmentation framework is introduced.
- DALS leverages the powerful nonlinear feature extraction abilities of fully Convolutional Neural Networks (CNNs) and the precise boundary delineation abilities of Active Contour Models (ACMs).
- A Multiorgan Lesion Segmentation (MLS) dataset, that contains images of various organs, is used for evaluation.
Outline
- Multiorgan Lesion Segmentation (MLS) Dataset
- Deep Active Lesion Segmentation (DALS) Framework
- Results
1. Multiorgan Lesion Segmentation (MLS) Dataset
- The liver component of the dataset consists of 112 contrast-enhanced CT images of liver lesions (43 hemangiomas, 45 cysts, and 24 metastases), and 164 liver lesions from 3T gadoxetic acid enhanced MRI scans.
- The brain component consists of 369 preoperative and pretherapy perfusion MR images.
- The lung component consists of 87 CT images.
- For each component of the MLS dataset, 85% of its images ae used for training, 10% for testing, and 5% for validation.
2. Deep Active Lesion Segmentation (DALS) Framework
2.1. Fully Convolutional Network (FCN)
- The fully convolutional encoder-decoder architecture is used.
- Dense block, originated from DenseNet, is used for each encoding block. In each dense block of the encoder, a composite function of batch normalization, convolution, and ReLU is applied to the concatenation of all the feature maps [x0, x1, …, xl-1] from layers 0 to l-1 with the feature maps produced by the current block.
- The last dense block in the encoder is fed into a custom multiscale dilation block, as in DeepLab or DilatedNet, with 4 parallel convolutional layers with dilation rates of 2, 4, 8, and 16.
- Before being passed to the decoder, the output of the dilated convolutions are then concatenated to create a multiscale representation of the input image thanks to the enlarged receptive field of its dilated convolutions.
This, along with dense connectivity, assists in capturing local and global context for highly accurate lesion localization.
- The input image is fed into the encoder-decoder, which localizes the lesion and, after 1×1 convolutional and sigmoid layers, produces the initial segmentation probability map Yprob(x, y).
During training, Yprob and the ground truth map Ygt(x, y) are fed into a Dice loss function.
2.2. Active Contour Model (ACM)
- The boundaries of the segmentation map generated by the encoder-decoder are ne-tuned by the level-set ACM.
- The Transformer converts Yprob to a Signed Distance Map (SDM) Φ(x, y, 0) that initializes the level-set ACM. (Authors did not mention clearly whether it is Transformer layer as I thought about. For SDM, please feel free to read about SDM.)
In brief, ACMs leverage parametric (“snake”) or implicit (level-set) formulations in which the contour evolves by minimizing an associated energy functional, typically using a gradient descent procedure.
ACM, a.k.a. Snake, is a very famous non-deep-learning approach in computer vision to segment things.
- Given an image I(x, y), let C(t)={(x, y) | Φ(x, y, t) = 0} be a closed time-varying contour represented in Ω. The interior and exterior regions of C are specified by the smoothed Heaviside function HIε(Φ), and HEε(Φ)=1-HIε(Φ). The narrow band near C is specified by the smoothed Dirac function δε(Φ). m1 and m2 as the mean intensities of I(x, y) inside and outside C and within Ws.
- (I can only give the briefs of what terms involved in ACM, there are pretty much math involved here related to ACM or snake. Also, the paper did not provide too much details. If you’re interested, please feel free to read, (1) “Snakes: active contour models,” IJCV 1988, and (2) “Active contours with selective local or global segmentation: A new formulation and level set method,” J. IMAVIS, 2009.)
- The energy functional associated with C can be written as
- The energy density is:
Intuitively, with initial rectangles given in the first row, Snake will find the segmentation boundary for it.
- In this paper, λs are given by Yprob instead of giving it constant:
Then with initial ACM and λs, snake is performed as post-processing step to fine-tune the boundary.
- The entire inference time for DALS takes 1.5 seconds.
3. Results
- (There are errors in Fig.3, after I cross-checked the values, Fig. 3(a) should be Hausdorff Distance, Fig. 3(b) should be Dice Score).
- DALS is compared against U-Net and manually-initialized level-set ACM with scalar λs parameter constants as well as its backbone CNN.
DALS achieves superior accuracies under all metrics and in all datasets.
- (Indeed, by just looking at box plots, solely using CNN isn’t bad at all.)
The DALS segmentation contours conform appropriately to the irregular shapes of the lesion boundaries. In most cases, DALS avoided local minima and converged onto the true lesion boundaries, thus enhancing segmentation accuracy.
The learned λs maps serve as an attention mechanism that provides additional degrees of freedom for the contour to adjust itself precisely to regions of interest.
Reference
[2019 MLMI] [DALS]
Deep Active Lesion Segmentation
4.2. Biomedical Image Segmentation
2015–2019 … [DALS] 2020 [MultiResUNet] [UNet 3+] [Dense-Gated U-Net (DGNet)] [Non-local U-Net] [SAUNet] [SDM] [DIU-Net] [Chen FCVM’20] [cGAN+AC+CAW] [RA-UNet] 2021 [Expanded U-Net]