Brief Review — Looking at Outfit to Parse Clothing
FCN + Outfit Filter + CRF
3 min readSep 15, 2024
Looking at Outfit to Parse Clothing
FCN + Outfit Filter + CRF, by Tohoku University, and The University of Tokyo
2017 arXiv v1, Over 70 Citations (Sik-Ho Tsang @ Medium)Image Segmentation
2017 … 2022 [YOLACT++] 2023 [Segment Anthing Model (SAM)] [FastSAM] [MobileSAM]
==== My Other Paper Readings Are Also Over Here ====
- FCN is extended with a side-branch network which is referred as an outfit encoder to predict a consistent set of clothing labels to encourage combinatorial preference, with also the use of conditional random field (CRF) to explicitly consider coherent label assignment to the given image.
Outline
- FCN + Outfit Encoder + CRF
- Results
1. FCN + Outfit Encoder + CRF
1.1. FCN
- FCN-8s is used.
1.2. Outfit Encoder (Green)
- The outfit encoder is a side branch, which inserts two fully-connected (FC) layers and a sigmoid layer to predict a vector of clothing indicators.
- The first FC layer has 256 dimensions, and the second FC layer has dimensions equal to the number of classes in the dataset. The second layer predicts confidences of existence of each garment, which can be viewed as soft-attention or gating function to the segmentation pipeline.
- The 2nd FC is connected with a sigmoid.
- With the heat-maps of the FCN denoted by Fi for each label i, and the scalar prediction by our outfit encoder denoted as gi. A product to obtain the filtered heat-maps Gi:
- (This concept is similar to the one in SENet.)
1.3. Conditional Random Fields (CRF)
- The energy function is a fully-connected pairwise function:
- where x is the label assignment for pixels.
- As seen, there is an unary term and a pairwise term. One for the class probability. One for the correlation between 2 positions.
- (This is used as a post-processing filter to further boost the segmentation performance for early segmentation neural network, e.g.: DeepLabv1 & DeepLabv2)
2. Results
FCN + Outfit Filter + CRF does NOT always obtain the best results.
- Authors believe that there might be a drawback from the proposed side-path architecture, which increases the risk of overfitting against small datasets, because the proposed outfit encoder must be trained to predict an image-level category.
- Some examples are shown above.