Review — DoFE: Domain-Oriented Feature Embedding for Generalizable Fundus Image Segmentation on Unseen Datasets
DoFE, Dynamically Enriches the Image Features with Additional Domain Prior Knowledge Learned from Multi-Source Domains
DoFE: Domain-Oriented Feature Embedding for Generalizable Fundus Image Segmentation on Unseen Datasets,
DoFE, by The Chinese University of Hong Kong, Stanford University, Shenzhen University, and Chinese Academy of Sciences,
2020 TMI, Over 50 Citations (Sik-Ho Tsang @ Medium)
Medical Image Analysis, Medical Imaging, Image Segmentation
- A novel Domain-oriented Feature Embedding (DoFE) framework is proposed to improve the generalization ability of CNNs on unseen target domains by exploring the knowledge from multiple source domains.
- A Domain Knowledge Pool is introduced to learn and memorize the prior information extracted from multi-source domains.
- Then the original image features are augmented with domain-oriented aggregated features, which are induced from the knowledge pool based on the similarity between the input image and multi-source domain images.
- DoFE Framework
- DoFE Learning Strategy
1. DoFE Framework
- DeepLabv3+ with the MobileNetV2 is used as the backbone.
- The low-level feature is concatenated with the high-level global feature hg for further fine-grained segmentation. The domain prior knowledge is calculated from the concatenated feature hs.
- Specifically, the low and high-level features are extracted before the ReLU and Batch Normalization layers to keep the value distribution. After concatenation, the features are further normalized using a Gaussian distribution with zero mean and unit variance.
1.2. Domain Knowledge Pool
- Domain knowledge pool Mpool is explicitly incorporated into the network. Within this knowledge pool, each item represents the domain prior knowledge of a single training dataset in source domains.
- Formally, for the k-th source domain dataset Dk, high-level semantic feature fik is extracted from the pre-trained segmentation network for each input image xik ∈ Dk. Then, the average semantic feature along the spatial dimension is calculated:
- as the initialization of the k-th item in the domain knowledge pool.
- During the training process, the domain knowledge pool is further updated alongside the training with momentum, to find a more discriminative representation of each domain, following:
1.3. Domain Similarity Learning
- For a given input image, DoFE aims to compose domain-oriented aggregated feature hagg from the domain knowledge pool Mpool to enrich its semantic feature hs to be more discriminative.
- Since the domain code relates more to the high-level feature information, the domain code prediction branch is added following the high-level global feature hg.
- The domain code prediction branch consists of a global average pooling layer, a Batch Normalization layer, a ReLU activation layer, and a convolutional layer. The last convolution layer is employed for final domain code prediction (pdc), and a Softmax activation layer is used to normalize the predicted values.
1.4. Domain-Oriented Aggregated Feature
- The domain-oriented aggregated feature hagg is formulated as a weighted sum of the items in the domain knowledge pool according to the domain code:
- Then, feature hagg is tiled into ˆhagg with the shape of hs for further calculation.
1.5. Dynamic Feature Embedding
- The original semantic feature hs is augmented with the tiled domain-oriented aggregated feature ˆhagg with an attention-guided mechanism. In this way, the aggregated features could be selected dynamically.
- A convolutional layer is added following the original semantic feature hs and then a tanh activation layer is used to generate self-attention map m (selective mask):
- The final dynamic domain-oriented feature h is represented as
- where ⊗ represents element-wise multiplication.
- Finally, the fine-grained segmentation masks are generated using the decoder at the end.
2. DoFE Learning Strategy
2.1. Domain Code Smooth
- For the i-th input image from the k-th source domain, the hard one-hot ground truth (yk,i_dc) for the predicted domain code can be represented as:
- where sk=1 and all the other items equal to zero.
- The hard one-hot ground truth is smoothed by randomly perturbing sk into the range [0.8, 1.0] and assigning random non-negative values:
- The training of the domain code prediction branch can be regarded as a regression problem, and MSE is used:
- And the binary Cross-Entropy loss (BCE) is used for OD and OC segmentation:
- Finally, the total loss is:
- with α=0.1.
- Using pretrained VGG-16, t-SNE shows that the image features are quite different for different datasets.
3.2. OC & OD Segmentation
DoFE framework surpasses the baseline model by a considerable margin (an average DSC of 2.90%), showing the generalization ability improvement of the DoFE framework.
DoFE generates the smallest average ASD and HD errors except that the HD error of DoFE on OD segmentation is comparable with DST.
It is quite hard to distinguish OD and OC for other methods due to the low image contrast, while DoFE is still able to segment OC and OD with accurate boundaries.
3.3. Vessel Segmentation
DoFE achieves more accurate vessel segmentation on the ACC, SP, and AUC metrics.
- (There are still other experimental results in the paper. Please feel free to read the paper if you’re interested.)